Overview
Conduit-lite refers to the conduit setup in cloud environments where typically low volume data streams are generated by producers. Hence hadoop cluster may not provide cost benefit. Instead Amazon S3 service is used to store data. S3 is designed to provide extremely high durability (upto 99.9 nines's). Also note that there is no Job Tracker to execute conduit-lite MR jobs. Instead these jobs are executed in the local runner.
Topology
The recommended topology to setup conduit-lite in a cluster is:
- Configure an instance of scribe agent to run locally on the same box as the message publisher. The publisher writes messages to the local scribe agent, thus providing high write throughput and isolation from any downstream failure. Scribe agent should be configured to write to scribe collector(s). The agent is capable of handling various downstream failure scenarios and ensure that there is no data-loss by spooling data to its local disk and periodically de-spooling.
- setup below.
- Conduit typically supports message sizes of ~2-3K. However it has been successfully used to send messages upto 10K size as well.
Setup on S3/S3N
Conduit-lite setup
The following link describes how to configure conduit-lite: Setup
Using S3NativeCopyMapper
The S3NativeCopyMapper class is used to do server side copy within the S3 buckets.The copy mapper reduces the need to stream the data from S3 bucket to EC2 client on which CopyMapper runs.
In order to use S3NativeCopyMapper within conduit lite configure the cluster in the conduit.xml to use copy mapper implemenation as following:
<cluster name="" hdfsurl="s3n:///"
jturl=""
jobqueuename=""
copyMapperClass = "com.inmobi.conduit.local.S3NCopyMapper"/>
Please note that the S3NativeCopyMapper works with S3N filesystem.
Benchmarking
Here are the results of various benchmarking tests done on EC2 cluster:
Latency
Run benchmarks
============ Mean Latency ============
- Collector Benchmark 44900 ms
- Local Benchmark 206291 ms
- Merge Benchmark 293522 ms
- Mirror Benchmark 327832 ms
Notes:
- The above test covers the scenario of scribe collector writing to S3 bucket and conduit local/merge/mirror stream services processing files in S3. Latency is calculated by different consumers that read data written by collector/local/merge/mirror stream writers.
- Merge/mirror stream is configured across different regions.
- This test finds the mean latency (average latency to read all messages). In general, the order of mean latency (for collector/local/merge/mirror) is (<1 min/3-4 min/4-5 min/5-6 min).
- This test does not cover percentile latency. However it should be possible to find percentile latency by using conduit audit feature exposed through audit dashboard. Please refer to Conduit visualization for more details.
Throughput
A) Refer to producer/agent/collector throughput results .
Notes:
- This test finds the maximum acceptable write throughput of producer/agent/collector that ensures steady state, i.e. no message loss at producer, or no spooling at agent/collector.
- The observed throughput is highly dependent on the EC2 instance type used. In general, m1.medium instance runs into high load average/CPU %wa and gives inconsistent results when co-hosting publisher and scribe agent. At least m1.large instance is recommended.
Capacity planning
Throughput section above. The aim is to have a conduit-lite setup where the system is running in steady state, i.e. there is no message loss/spooling happening at either of publisher, agent or collector tiers.
- In order to write total of 'N' MB/s through conduit-lite layer, minimum collector nodes needed = N/(max collector write throughput). If more than 1 collector is needed, then you would need to configure a VIP layer between scribe agents and collectors.
- strongminimum (publisher + scribe agent) nodes needed = N/ (max agent write throughput) /strongin steady state. Note that publisher message queue size/scribe agent settings needs to be configured appropriately.
- Scribe agent and collector nodes need to have sufficient disk space available in order to spool/de-spool data in case of rate mismatch. []