Setting up Hadoop is not exactly easy. Amazons EMR strives to makes a lot easier by setting up fully a functional Hadoop MapReduce environment on the fly with the click of a button. And the best thing is, we can inject dynaTrace fully automatically as well!


In this article I will explain how to use bootstrap scripts in Amazon EMR to inject dynaTrace automatically. This requires the following steps:

  1. making the dynaTrace Server accessible from the Cloud (or starting a predefined AMI)
  2. Creating a new Hadoop specific System Profile
  3. starting AWS EMR with a bootstrap script
  4. run jobs and analyze them


Once you have started a dynaTrace Server inside EC2 all you need to do is create a new system profile and quickstart the Hadoop integration. After that is done start your AWS EMR cluster and make sure to add Compuware APM bootstrap script. This can be done via the Amazon Console or the EMR Command line interface.

In case of the EMR console please add the shown bootstrap script and add the EC2 private IP/DNS your dynaTrace Server as the argument.

If you want to use the Command Line Script add the following parameters to your Job Flow Create call.

The bootstrap script will automatically inject the dynaTrace agent into all daemons and MapReduce tasks that are executed on the Job Flow and wire them against the created system profile.

After this you are ready to go and analyze your first job.

Run a Job and Analyze the results.

The next step is to actually execute a job. As a test you can execute the word count sample:

The wordcount.jar is attached to this page and is a word count sample with a couple of different parameters (just download and decompile if you are interested). What it does is to count every occurrence of a word in every file in the s3://elasticmapreduce/samples/wordcount/input S3 bucket and store the result in the <my-s3-bucket> (supply your own one).

When you look at the dynaTrace client, open up the MapReduce Business Transaction and you will see the result.


Amazon EMR makes it really easy to play with Hadoop and as we've seen its really straight forward to inject dynaTrace into this environment fully automatically.


Not Logged In? Customers and AJAX Edition Users Login with your Community Account