Friday, June 26, 2009

Notes on Performance and scalability

blog.dynatrace.com
blog.codecentric.com
highscalability.com

Scalability: Increased resources results in increased performance
- Werner Vogels

Scalability
- Can ensure performance with increasing load.

Performance
- characteristics like response time, processing volume..

Queuing Theory
- Modeling resource
- Resources and waiting requests are modeled as queues
- EX: CPU and current requests

Queuing Networks
- Modeling of app as network of resources
- Combing multiple queues
- EX: CPU, Network, Connections Pools, Servlet Threads

Scalability and Requirements
- Increasing number of parallel requests
- Increasing number of parallel users
- Increasing number of data
- Increasing availability with same performance

Types of Scalability
- Vertical (UP)
A physical node gets more resources (CPU or memory)
In virtual env "on demand"
- Horizontal (OUT)
A new physical node is added
Trends to more and more servers
Distributed system

If system is fast in single user mode but slow under heavy load then the problem is scalability, not performance
- Cameron Purdy

Scalability does not improve performance
- Increases complexity and degrades performance
- Slower in single user mode
- performance is still an issue

Scalability can improve availability
- Scale out introduces redundance
- Synchronize status/state

Scalability never comes for free
- Must be engineered into the application
- Introduces additional complexity
- Plan time for building scalability into app

Limiting factors
- CPU Time
scalability via hardware
- Memory consumption
scalability via hardware
- I/O and Network
limited and harder to scale
architectural changes
- shared resource access
limited and harder to scale
architectural changes

Major problems
- Database access
high number of database requests
Locks and concurrent access
- Remoting Behavior
Bad interaction design (communication patterns)
High data volume
(it's too easy to make remote calls)
- Locking
Configuration problems (Connections Pools)
Wrong synchronization patterns
Serial data access

Metrics
- Throughput
- Response times
vs CPU time
- System metrics
CPU and Garbage Collection
Memory pools (including generations)
- Application metrics
connections pools
component level
SQL
Transactional trace data

Problems:
- Complex interdependencies due to frameworks
- High serialization
- Full heap due to memory leak
- Inefficient or redundant remote calls
- Too many SQL calls

Database access
- Wrong use of O/R mappers
Configuration/Loading behavior
- Bad Transaction Design
Connection kept too long
Isolation level incorrectly defined (or not at all)
- Inefficient data loading logic

Distributed Systems/Remoting
- Bad interface design
Interfaces are too generic
Interactions not suitable for SOA
- Wrong communication protocols
SOAP services in homogeneous landscape
Sync instead of async interactions

Synchronization
- Locks kept too long
- wrong lock granularity
- Neglecting locking at DB level

Serial access
- Access MUST be handled in serial way
- Scalability logically limited

Too much synchronous program logic
- Developer think procedurally
- Sync interaction in distributed systems is problematic
- high resource usage
- typical indicator: low CPU while other resources saturated
- class ex: web apps

Memory management
- Bad GC Config
Generation size; collection strategy
- Unnecessary creation of objects
Serialization
O/R mapping frameworks
- Memory leaks
Bad reference clearing logic

Problems in development
- Don't understand dynamic program behavior
- Sacrifice functionality over scalability and performance
- Waiting for symptoms
- Real problems often only visible under high load

Questions
- tools: jimmi, selenium, sniffer, soapUI,
- check DB and/or remoting at continuous integration time