Audit Feature

Overview

TIERS> in the conduit setup.

'_audit' topic which is internal to conduit stack. These audit messages flow through conduit in the same way as messages of other topics generated by publisher library. When the publisher library generates a message for any topic, it prepends the message with an immutable header, that contains generation timestamp (and other fields). As and when the messages flow through a tier, that tier tracks the number of messages seen for various publish time periods and periodically generates an audit message capturing this information.

The following conduit TIERS are supported in the first phase of audit feature: publisher, agent, collector, hdfs. Next phase will support subsequent TIERS: local stream, merged stream, mirror stream. Note that the set of TIERS are defined internally within conduit; they can't be added externally.

Note that if audit messages get lost/replayed, false negatives are possible. Else if both normal messages and associated audit messages are lost, false positives are possible.

Version

The phase 1 of audit feature is available in the following versions:

publisher library: 2.x

scribe agent: 0.4.1 or later

scribe collector: 0.4.2 or later

Configuration Setup

Publisher Configuration

By default, audit feature is turned off in messaging-client publisher.

In order to turn audit feature on, add the below property within messaging-publisher-conf.properties:

audit.enabled=true

Scribe Configuration

By default, audit feature is turned OFF in scribe agent and collector tiers.

In order to turn audit ON, add a new store in agent and collector configuration with *_audit* as the category.

Sample configuration for agent: scribe-agent.conf

Sample configuration for collector: scribe-collector.conf

The notes section in each of the links above clearly mention what configuration can be manually modified. Please follow it carefully.

New configuration variables

tier: The logical TIER that the scribe process is a part of.

  • Supported values are "agent" and "collector"

window_size: The granularity (in seconds) at which incoming messages will be bucketed while generating audit stats.

  • Default value is 60. This means that all messages generated by publisher within 60 second time window will be counted together. []

Audit Query Tool

The audit feature provides a generic command line tool to execute audit queries. The query can return the stats for any topic at any tier for any given time period.The stats currently emitted are number of messages received by each tier and latency from source in terms of percentile.

The audit query command is exposed as part of audit-client script that is bundled as part of conduit-audit deb package.

Usage

Run audit-client script from the installed location of conduit-audit package (/usr/local/conduit-audit):

 
 bin/audit-client audit [-group <TIER,HOSTNAME,TOPIC,CLUSTER>] 
          [-filter <TIER=xxx,HOSTNAME=xxx,TOPIC=xxx,CLUSTER=xxx>] [-percentile <list of percentiles> <dd-mm-yyyy-HH:mm>
          <dd-mm-yyyy-HH:mm> --conf <confdir> 
          

Example:

Consider a setup where publisher, scribe agent, scribe collector and HDFS processes are running on host conduitdev1. User wants to audit all the messages generated by publisher library for a topic 'benchmark_merge' between 08:44 and 09:04 on 24/04/2013.

Here is an example query to get audit stats and the expected output when there is no data loss:

        bin/audit-client audit -group TIER,HOSTNAME,TOPIC -filter topic=benchmark_merge -percentile 99
         24-04-2013-08:44 24-04-2013-09:04  --conf /usr/local/messaging-client/conf 
        
        Warning: JAVA_HOME not set!
        Displaying results for AuditDbQuery [fromTime=24-04 08:44, toTime=24-04 09:04, cutoffTime=1,
        groupBy=GroupBy[HOSTNAME, TOPIC, TIER], filter=Filter{TOPIC=benchmark_merge},
        timeout=120000, rootdir=hdfs://localhost:9000/conduit]
                
        [{"HOSTNAME":"localhost","TOPIC":benchmark_merge,"Received":60000,"CLUSTER":"conduitdev1","TIER":"publisher",
        "Latencies":{"99.0":1}},{"HOSTNAME":"localhost","TOPIC":benchmark_merge,"Received":60000,"CLUSTER":"conduitdev1",
        "TIER":"AGENT","Latencies":{"99.0":1},{"HOSTNAME":"localhost","TOPIC":benchmark_merge,"Received":60000,
        "CLUSTER":"conduitdev1","TIER":"collector","Latencies":{"99.0":1}},{"HOSTNAME":"localhost","TOPIC":benchmark_merge,
        "Received":60000,"CLUSTER":"conduitdev1","TIER":"hdfs","Latencies":{"99.0":2}}]

'|' symbol whereas columns are separated by ','.

-filter TOPIC=benchmark_merge|benchmark_local,TIER=hdfs|agent

The <confdir> location should have audit consumer configuration file named "audit-consumer-conf.properties" for consuming the audit messages.

audit-feeder.properties

Audit Phase 2

Audit Phase 2 is the extension of the current audit feature which provides end to end view of data being transfered by conduit.

Previous version was providing metrics till HDFS and this version adds the remaining tiers LOCAL,MERGE and MIRROR.

Conduit worker uses messaging publisher to publish audit messages to scribe which in turn writes to HDFS.

Metrics emitted are same as previous tiers which provide the number of messages received by a tier in a given time frame.

Audit Query also remains unchanged.

Versions

Audit Phase 2 is present in conduit worker version 2.2.0 onwards.

Installation

1) Setup a local scribe collector to be setup on each worker box and white list the stream "_audit"

2)Since messaging publisher is used by worker hence now we also need to have a configuration file for the scribe messaging publisher.

The path of messaging publisher is configurable via the property "com.inmobi.conduit.audit.publisher.config" in the conduit.cfg

3) upgrade the conduit worker to latest version and restart the service.

PS:a) No need to setup DISTCP_HOME as distcp is being bundled as part of worker.

b) From this version onwards conduit would be using reduce slots as well.Both local stream service and merge/mirror has 1 reducer/job now.

Audit Feeder

Audit feeder runs as a daemon and is responsible for reading the audit messages generated from all the colos and than process and feed them into a central DB.This daemon leverages the power of pintail to read audit messages from different colos.

Feeder can be started using the audit-client script which is part of conduit-audit package.

Install

Steps to install the feeder:

  1. Install the debian package conduit-audit.
  2. After installation replace the sample config file located at /usr/local/conduit-audit-<version>/conf/audit-feeder.properties with the actual config file.
  3. Mandate fields to be set are : db.url,db.username,db.password,messaging.consumer.checkpoint.dir,feeder.conduit.conf,audit.table.master.<br />Description of each of these config parameters is provided in the sample config file packaged as part of debian.
  4. After replacing the config start the feeder,following the steps given in "Start" section.

Start

Command to start the feeder is :

nohup /usr/local/conduit-audit-<version>$ bin/audit-client feeder --conf <path to conf directory> 2>&1 &

Conf directory must contain

  1. audit-feeder.properties
  2. core-site.xml(part of conduit-audit package)

    audit-feeder.properties

Detailed template of the audit-feeeder.properties is part of the conduit-audit package.Please refer to that for all other config params.

Stop

To stop the feeder issue :

kill TERM <pid of feeder>;

Audit Rollup

link