TIERS> in the conduit setup.
'_audit' topic which is internal to conduit stack. These audit messages flow through conduit in the same way as messages of other topics generated by publisher library. When the publisher library generates a message for any topic, it prepends the message with an immutable header, that contains generation timestamp (and other fields). As and when the messages flow through a tier, that tier tracks the number of messages seen for various publish time periods and periodically generates an audit message capturing this information.
The following conduit TIERS are supported in the first phase of audit feature: publisher, agent, collector, hdfs. Next phase will support subsequent TIERS: local stream, merged stream, mirror stream. Note that the set of TIERS are defined internally within conduit; they can't be added externally.
Note that if audit messages get lost/replayed, false negatives are possible. Else if both normal messages and associated audit messages are lost, false positives are possible.
The phase 1 of audit feature is available in the following versions:
publisher library: 2.x
scribe agent: 0.4.1 or later
scribe collector: 0.4.2 or later
By default, audit feature is turned off in messaging-client publisher.
In order to turn audit feature on, add the below property within messaging-publisher-conf.properties:
audit.enabled=true
By default, audit feature is turned OFF in scribe agent and collector tiers.
In order to turn audit ON, add a new store in agent and collector configuration with *_audit* as the category.
Sample configuration for agent: scribe-agent.conf
Sample configuration for collector: scribe-collector.conf
The notes section in each of the links above clearly mention what configuration can be manually modified. Please follow it carefully.
tier: The logical TIER that the scribe process is a part of.
window_size: The granularity (in seconds) at which incoming messages will be bucketed while generating audit stats.
The audit feature provides a generic command line tool to execute audit queries. The query can return the stats for any topic at any tier for any given time period.The stats currently emitted are number of messages received by each tier and latency from source in terms of percentile.
The audit query command is exposed as part of audit-client script that is bundled as part of conduit-audit deb package.
Run audit-client script from the installed location of conduit-audit package (/usr/local/conduit-audit):
bin/audit-client audit [-group <TIER,HOSTNAME,TOPIC,CLUSTER>] [-filter <TIER=xxx,HOSTNAME=xxx,TOPIC=xxx,CLUSTER=xxx>] [-percentile <list of percentiles> <dd-mm-yyyy-HH:mm> <dd-mm-yyyy-HH:mm> --conf <confdir>
Example:
Consider a setup where publisher, scribe agent, scribe collector and HDFS processes are running on host conduitdev1. User wants to audit all the messages generated by publisher library for a topic 'benchmark_merge' between 08:44 and 09:04 on 24/04/2013.
Here is an example query to get audit stats and the expected output when there is no data loss:
bin/audit-client audit -group TIER,HOSTNAME,TOPIC -filter topic=benchmark_merge -percentile 99 24-04-2013-08:44 24-04-2013-09:04 --conf /usr/local/messaging-client/conf Warning: JAVA_HOME not set! Displaying results for AuditDbQuery [fromTime=24-04 08:44, toTime=24-04 09:04, cutoffTime=1, groupBy=GroupBy[HOSTNAME, TOPIC, TIER], filter=Filter{TOPIC=benchmark_merge}, timeout=120000, rootdir=hdfs://localhost:9000/conduit] [{"HOSTNAME":"localhost","TOPIC":benchmark_merge,"Received":60000,"CLUSTER":"conduitdev1","TIER":"publisher", "Latencies":{"99.0":1}},{"HOSTNAME":"localhost","TOPIC":benchmark_merge,"Received":60000,"CLUSTER":"conduitdev1", "TIER":"AGENT","Latencies":{"99.0":1},{"HOSTNAME":"localhost","TOPIC":benchmark_merge,"Received":60000, "CLUSTER":"conduitdev1","TIER":"collector","Latencies":{"99.0":1}},{"HOSTNAME":"localhost","TOPIC":benchmark_merge, "Received":60000,"CLUSTER":"conduitdev1","TIER":"hdfs","Latencies":{"99.0":2}}]
'|' symbol whereas columns are separated by ','.
-filter TOPIC=benchmark_merge|benchmark_local,TIER=hdfs|agent
The <confdir> location should have audit consumer configuration file named "audit-consumer-conf.properties" for consuming the audit messages.
Audit Phase 2 is the extension of the current audit feature which provides end to end view of data being transfered by conduit.
Previous version was providing metrics till HDFS and this version adds the remaining tiers LOCAL,MERGE and MIRROR.
Conduit worker uses messaging publisher to publish audit messages to scribe which in turn writes to HDFS.
Metrics emitted are same as previous tiers which provide the number of messages received by a tier in a given time frame.
Audit Query also remains unchanged.
1) Setup a local scribe collector to be setup on each worker box and white list the stream "_audit"
2)Since messaging publisher is used by worker hence now we also need to have a configuration file for the scribe messaging publisher.
The path of messaging publisher is configurable via the property "com.inmobi.conduit.audit.publisher.config" in the conduit.cfg
3) upgrade the conduit worker to latest version and restart the service.
PS:a) No need to setup DISTCP_HOME as distcp is being bundled as part of worker.
b) From this version onwards conduit would be using reduce slots as well.Both local stream service and merge/mirror has 1 reducer/job now.
Audit feeder runs as a daemon and is responsible for reading the audit messages generated from all the colos and than process and feed them into a central DB.This daemon leverages the power of pintail to read audit messages from different colos.
Feeder can be started using the audit-client script which is part of conduit-audit package.
Steps to install the feeder:
Command to start the feeder is :
nohup /usr/local/conduit-audit-<version>$ bin/audit-client feeder --conf <path to conf directory> 2>&1 &
Conf directory must contain
Detailed template of the audit-feeeder.properties is part of the conduit-audit package.Please refer to that for all other config params.