He knows if you’ve been bad or good, so be good for goodness sake …
This is a hands-on narrated tour on how to learn AppDynamics to detect trouble.
I want you to feel confident that you’ve mastered this skill. That’s why this takes a hands-on approach where you type in commands and we explain the responses and possible troubleshooting. This is a “deep dive” because all details are presented.
Like a good music DJ, I’ve carefully arranged the presentation of concepts into a sequence for easy learning, so you don’t have to spend as much time as me making sense of the flood of material around this subject.
Sentences that begin with PROTIP are a high point of this website to point out wisdom and advice from experience. NOTE point out observations that many miss. Search for them if you only want “TL;DR” (Too Long Didn’t Read) highlights.
Stuck? Contact me and I or one of my friends will help you.
Agents are installed on each machine you wish to monitor. AD claims “Up to 10,000 per controller” at “less than 2% overhead”.
They automatically discover servers (very cool, especially on Hadoop clusters).
Controllers are configured by a controller-info.xml file.
It collects metrics reported by different types of agents:
Stand-alone machine agents obtain Java Thread Pool and other JVM stats.
End-User Monitor (EUM) absorb activity from listening to network traffic.
Database Monitors extract from databases:
AD has a Java Crash Guard
Cool features. So What?
There are several aspects that sales pitches don’t cover, but keep actual users up at night.
The Flow Map provides icons representing every server in each tier (Web-Tier-Services, Database, etc.). Metrics obtained are classified compared to averages observed in the past.
PROTIP: AD gathers a huge amount of information every hour. How many GB per day is that rate?
PROTIP: There is a cost to keep data. Have a plan for how long to keep data on AD’s servers, Many people save just the summary information for management reporting.
PROTIP: Often we don’t realize what analysis to do until later, and by then the historical data is gone. How much is longitudinal operational analysis is worth? This needs to be decision by management so they are not disappointed later.
AD exposes data via REST APIs [webinar].
PROTIP: REST APIs are rather granular. If a large amount of data needs to be extracted, a mechanism is needed to not overload the AD server by spreading out calls over time.
AD provides a way to create graphs dynamically.
PROTIP: Consider dumping monitoring data for analysis using your organization’s analytics product such as ElasticSearch, Tableau, etc. This would focus leverage of skills brough to this work, and (more importantly) make it easier and more likely for other statistics in the organization to be integrated, such as impact performance has on dollar sales, etc.
AD displays response time captured from every entry and exit point, down to specific page iframes.
PROTIP: How does each page compare against metrics predicted during performance testing? Identify surprises so you don’t have to look at what you already know.
PROTIP: Ultimately, organizations need to proactively predict rather than responding to alerts which occur.
In my opinion, AD can do more than it is in this aspect. For example, project the trend of disk space usage to identify the urgency so the appropriate notification method is used.
PROTIP: It helps to identify up front specifically who should be called (and when) for specific alerts that can appear.
PROTIP: Escalation specification is the specialty of PagerDuty software.
Browser distribution pie charts on Web App Dashboards is based on what clients responds. Some networks strip that out, so watch out for “unknown”. These stats can differ significantly versus Google Analytics if you also have that installed.
PROTIP: Identify the percentage of browser distribution chart associated with specific customer accounts so you can track their movement from IE to modern browsers needed by React.js and other web apps.
Synthetic users are useful not just to identify issues during off-hours.
PROTIP: Synthetic users ensure that programs don’t page off memory and cause delays (bad performance) for the first user who gets on the system in the morning.
AD has Health Rules and policies that trigger actions.
PROTIP: If the action is to send an email, someone is needed to constantly go through screens to notice problems.
This is a 3-shift job, with no breaks on weekends and holidays. So plan work schedules and training for this. Some use 12 hour workdays and office accross the globe for this reason.
PROTIP: Use “Choas monkeys” to create havoc randomly, in production, to ensure that responses are adequate.
Remediation scripts to take action after problems.
PROTIP: Has that script been tested in production? Many organizations don’t have an enviornment to conduct such tests.
PROTIP: If the response is to add more servers, how much free server capacity available?
When an issue is detected, the icon turns red and diagnostic snapshots are automatically captured. [video]
PROTIP: If the server is already down, there is not much point (and wastes precious time) to take diagnostics dumps.
PROTIP: There are utilities which help you analyze dumps from the operating system, which can be quite cryptic.
End-user Response-Time Distribution spikes on Web App Dashboards identify the percentile of each spike.
PROTIP: A spike on the 50th percentile is more troubling becuase of its consistency than one at 90th percentile or above.
Data unique to app transactions or other GUID can be added by AD as HTTP headers for precise tracking.
This is a powerful way to pin-point the total amount of time on each tier.
Number of AD licenses vs number of servers in production is a crucial operational metric.
PROTIP: Many organizations need to use a “borrow Peter to pay Paul” mode of operation in order to stay within allocated budgets. This means licenses are pulled from one server to install in servers being actively considered. In such cases, consider using an alternative monitoring mechanism for servers which do not have AD agents. Such a switch can be a part of automated server build parameters.
AWS Agent installation
The AD Metric Browser can display metrics as a mash-up of data from cloud vendors.
For example, AD can analyze CloudWatch metrics of Amazon S3 service usage for buckets designated:
Size of all the objects present in bucket(s).
Objects Count of objects present in bucket(s) configured
Since Last Modified Time of objects in bucket(s) configured.
The code and configuration notes are at
Download file S3Monitor.zip
The jar file in it was compiled from https://github.com/Appdynamics/aws-s3-monitoring-extension (by Satish Muddam on 05/11/2015) and includes dependencies pulled in.
PROTIP: Some enterprises prefer to recompile and store it in their Artifactory repository for internal use rather than downloading from the internet unreviewed by ethical hackers.
Unzip it into folder machineagent install dir/monitors/
The monitor.xml for AD points to the yml file. It’s in the config folder.
Open S3Configurations.yml in a text editor.
Copy and paste the *accesskey:
for S3 account.
Copy and paste the *secretkey:
for S3 account.
Copy and paste the * metricPrefix: Metric prefix path for AppDynamics controller.
onlyConsolidatedMetric: true or false(default).
If only consolidated metric is required (Bucket wise metrics will not be pushed to controller if this is true)
noOfThreads: Min-1, Max-32, Default-8
sizeunit: B, KB (Default), MB
timeunit: Seconds (Default), Minutes, Hours, Days
bucketNames: Bucket names to monitor.
To monitor all available buckets remove this field or Add a bucket named “All”.
To avoid permission issues, install the agent as the same user who owns the Machine Agent files or as an administrator on the host machine.
PROTIP: To detect any typos, use a YML validator such as http://www.yamllint.com, even when the file is generated.
Review the monitor.xml.
Restart the machineagent.
The Standalone Machine Agent (Machine Agent) starts within its own JVM. In 4.2, JRE 1.8 is bundled with the OS-specific Machine Agent installation downloads.
In the AppDynamics Metric Browser, look for: Application Infrastructure Performance | Tier | Custom Metrics | Amazon S3.
There is work necessary to instrument code objects to reveal them in monitoring.
Proxies or firewalls on the network may need to opened up for the agent to talk to the Controller at default port 8090.
From the AD Home screen Getting Started section, click “Agent Download & Install Wizard”.
AppServerAgent-184.108.40.206.zip, 11 MB
Unzip the file.
The folder contains a javaagent.jar file.
To the Java startup argument starting the application, add the Java Agent binary to the Java application process.
AD provides the line to add for each type of Application Server:
On Tomcat this means editing the catalina.sh file:
sudo cp catalina.sh catalina.sh.backup
sudo gedit catalina.sh
Use a text editor to add a line before Java is invoked:
export CATALINE_OPTS=”$CATALINA_OPTS -javaagent:/home/appduser/appd-agent/javaagent.jar”
if you’re not ready for a server reboot, attach to a running process.
sudo java -Xbootclasspath/a:/usr/lib/jvm/java-1.7.0-openjdk-220.127.116.11.x86_64/lib/tools.jar -jar /home/appduser/appd-agent/javaagent.jar
Don’t press Enter to submit until you get its PID:
sudo ps -A | grep java
Highlight and copy the process number returned to paste at the end of the command above.
Invoke one of the sample apps to impose load such as at:
- The 1st section of the doc at
The AD Controller can integrate with various other systems.
But it’s through one-way HTTPS
The biggest concern enterprises have with cloud services is their security.
is the one-stop shop.
AppSphere in 2016 is on November 14 at the Marriott Cosmopolitan Las Vegas.
- Dr. Yujing Wu (@yujingwu)
LinkedIn shows Samsung America being based in Seattle and Bellevue, Washington. Smart Home in Mountain View, California. Home Appliances in Rigefield Park, New Jersey and Minneapolis, Minnesota.
- artik.io/blog/cloud is the user forum.
More on IoT
This is one of a series on IoT:
- IoT Apprentice school curriculum
- IoT use cases
- IoT reminders prevent dead mobile battery
- IoT text to speech synthesis
- IoT AWS button
- Intel IoT
- IoT Raspberry hardware
- IoT Clouds
- Predix basics
- Predix installation
- Predix services
- Predix programming