# DD Script



## Mun (Mar 29, 2014)

I know many of you hate LET now, so I am posting here for feed back as well.

http://lowendtalk.com/discussion/24393/dd-script#latest

I built a bash + crontab script that creates a list of your historical dd as well as one current one.

It can be seen here: Current: http://192.3.139.124/dd.txt Historical:http://192.3.139.124/dd_historical.txt

My fear is that it will be abused and cause people to aimlessly use this to "monitor" there server. Though you may say mine was aimless as well, I will disagree. In any case, I'd like to hear your thoughts if I should make this available as well as a poll.

Mun


----------



## yolo (Mar 29, 2014)

This

Is

The

Stupidest

Thing

In

The

World

I could not think of anything worse that anybody has made to 'monitor' their VPS


----------



## Packety (Mar 29, 2014)

hmm, why not?


----------



## Mun (Mar 29, 2014)

yolo said:


> This
> 
> Is
> 
> ...



Well You only live once, so who gives a fuck, right?


----------



## yolo (Mar 29, 2014)

Mun said:


> Well You only live once, so who gives a fuck, right?


If you want to be banned from every VPS provider yea!


----------



## Mun (Mar 29, 2014)

I more just hate the fact your username is yolo, I hear your point.

Mun


----------



## tonyg (Mar 29, 2014)

I like the idea but not the execution. After some time all this data will be unreadable.

Either:

Graph the data

or

Reduce the number of runs each day to maybe every six or eight hours


----------



## Mun (Mar 29, 2014)

tonyg said:


> I like the idea but not the execution. After some time all this data will be unreadable.
> 
> Either:
> 
> ...





tonyg said:


> I like the idea but not the execution. After some time all this data will be unreadable.
> 
> Either:
> 
> ...



I did actually  I think it is on 12?


----------



## tonyg (Mar 29, 2014)

I thought I saw runs every 2 hours.


----------



## Mun (Mar 29, 2014)

tonyg said:


> I thought I saw runs every 2 hours.



You did, I later changed it after I noticed an improvement on the hosts node and saw no need to check every few hours. (plus sorta felt bad)


----------



## SkylarM (Mar 29, 2014)

Mun said:


> You did, I later changed it after I noticed an improvement on the hosts node and saw no need to check every few hours. (plus sorta felt bad)


If everyone started running a DD on a cronjob every x hours I'd honestly setup monitoring to stop and/or limit the tests. That'd ad huge unnecessary disk activity for no real reason.


----------



## tchen (Mar 29, 2014)

Sweet jesus, 1GB!?


----------



## Mun (Mar 29, 2014)

tchen said:


> Sweet jesus, 1GB!?


Do you have a suggestion on which DD test to use? I can modify the script to use less, as long as people still see it as a viable test.


----------



## Flapadar (Mar 29, 2014)

Mun said:


> You did, I later changed it after I noticed an improvement on the hosts node and saw no need to check every few hours. (plus sorta felt bad)


And so you should. DD shouldn't be run to test write speed unless you're using it as one of several methods to prove a problem to your provider. 

The only time you should even consider releasing that script is if it checks steal time and iowait to see if they're high before running the test. However - that will still make the problem worse!


----------



## Mun (Mar 29, 2014)

I think I proved an issue with my provider when they where getting nearly constant I/O speed of less then 20MB/s.

Mun


----------



## Flapadar (Mar 29, 2014)

Mun said:


> I think I proved an issue with my provider when they where getting nearly constant I/O speed of less then 20MB/s.
> 
> Mun


Why not just monitor iowait and steal time instead?


----------



## Mun (Mar 29, 2014)

Flapadar said:


> Why not just monitor iowait and steal time instead?




I did those as well.


----------



## Flapadar (Mar 29, 2014)

Mun said:


> I did those as well.


Why do you see DD as a necessary part of the script?


----------



## tchen (Mar 29, 2014)

Mun said:


> Do you have a suggestion on which DD test to use? I can modify the script to use less, as long as people still see it as a viable test.


I really can't recommend any sustained DD test.  If you're on a full virt like KVM or Xen, just read proc/diskstats.  On OpenVZ, the only thing I'd recommend is a 'single' ioping.  At regular intervals (much like what you're doing now with dd), it still paints a pretty good picture of any IO degradation before it becomes an issue without seriously adding to the problem.

The problem with the 1GB or anything sizeable is that it's going to blow out any cache.  At which point, you're really testing only the raw spindle speed, not anything related to how the node is setup.  To boot, that's going to fluctuate as the IO scheduler interleaves you with the other guy doing the dd test.  

P.S. I choose ioping because it defaults to just 4k.  More indicative of any random seek from your database / apache than sustained dd.


----------



## Mun (Mar 29, 2014)

IOWAIT is good to determine if your process(es) are/is being held up by your disk / latency, but it isn't a good determiner if that IOWAIT is because of how the program is written, or if it is because the disk is getting hammered.

Mun


----------



## SkylarM (Mar 29, 2014)

Mun said:


> I think I proved an issue with my provider when they where getting nearly constant I/O speed of less then 20MB/s.
> 
> 
> Mun


If you have to "prove" an issue to your provider before they will look into a disk performance issue, then you likely need a new provider*.

* I mean stop giving GVH money.

Releasing an easy to access script that chain runs DD results (regardless if on a timer or not) will likely result in an increase in abuse. There are people out there too lazy/not familiar with this sort of thing that would gladly attempt to raise hell with a provider if a script is easily accessible that does all the work for them. Keep that in mind.


----------



## tchen (Mar 29, 2014)

One other thing, if despite all recommendation to not release this you still decide to do so, then at the very least embed some randomize delay into the script.  The last thing anyone needs is a bunch of people running this script on a node on the hour, at exactly the same time.


----------



## DomainBop (Mar 29, 2014)

tonyg said:


> I thought I saw runs every 2 hours.


Every 2 hours is bad! If you run the script during even hours and the abusers only abuse during odd hours you'll never know that the node you're on has dd speeds below 10 Mbps half the day! Solution: run it every hour!


----------



## Nett (Mar 29, 2014)

Great script to run on shitty servers.


----------



## tchen (Mar 29, 2014)

Nett said:


> Great script to run on shitty servers.


It's the great equalizer.  They'll all be shitty servers thanks to this script


----------



## qrwteyrutiyoup (Mar 29, 2014)

tchen said:


> It's the great equalizer.  They'll all be shitty servers thanks to this script


LOL


----------



## dcdan (Mar 29, 2014)

Reminds me of this:

http://geekahost.com/db/show.php?mode=overview&server_id=21 (scroll down)

One day people will realize that this dd test is pointless in 99.9% of cases.


----------



## GVH-Jon (Mar 29, 2014)

Why do you have to DD spam _us_?  <_<


----------



## GIANT_CRAB (Mar 29, 2014)

GVH-Jon said:


> Why do you have to DD spam _us_?  <_<


Why would anyone wants to target you, unless you're an enemy of theirs.

Why fear even if you have only 1 enemy, unless you have hundreds of enemies.


----------



## kaniini (Mar 29, 2014)

Mun said:


> IOWAIT is good to determine if your process(es) are/is being held up by your disk / latency, but it isn't a good determiner if that IOWAIT is because of how the program is written, or if it is because the disk is getting hammered.
> 
> Mun


If you have a proper VM, i.e. not OpenVZ, you can measure request latency times to determine this.


----------



## manacit (Mar 30, 2014)

lol i didnt realize you made this dumb post in two places. I'll quote myself on LET

Another Mun post.

Here, I went ahead and did it myself. It took all of 15 minutes and I rarely do any bash scripting - why is this whole thing such a big deal again?

https://gist.github.com/nickvanw/9863660


----------



## tchen (Mar 30, 2014)

@manacit stop trolling.


----------



## raindog308 (Mar 30, 2014)

yolo said:


> This
> 
> Is
> 
> ...


I think it's worse than that, actually.

It's also lame.

I mean, what is this other than dd if=/something of=/something bs=something >> data.txt ?  "built a bash + crontab script" ?  Really?  It's not like you're even parsing the data and putting it in MySQL, producing graphs, etc.  What exactly is there to "release" -?  One line of "code"?

Besides the other suggestions, ioping is another useful thing.


----------



## Deleted (Mar 30, 2014)

This is one of the most useless 'scripts' I have ever seen in my 20 years of being around the IT field. 

ioping is equally useless on Linux because of the shared page (vsyscall) between userland and the kernel, which will not give you accurate results. Not only that, context switching is expensive on x86, Only way to measure raw access time is to do it within the kernel, not userland.


----------



## DomainBop (Mar 30, 2014)

> ioping is equally useless on Linux because of the shared page (vsyscall) between userland and the kernel, which will not give you accurate results.


Agreed about ioping tests, and if we want to take it a step farther, choosing a VPS provider based primarily on the "blazing fast" dd and ioping results the provider posted in their offer is equally pointless.


----------



## tchen (Mar 30, 2014)

DomainBop said:


> Agreed about ioping tests, and if we want to take it a step farther, choosing a VPS provider based primarily on the "blazing fast" dd and ioping results the provider posted in their offer is equally pointless.


Call me a noob here.  So the timing functions used ioping goes through vsyscall (I assume gettimeofday) on both ends of the heavier aio/block io calls.  Sure, the file system calls get hit with the context switch, but where does the inaccuracy come in?  Are we saying that it doesn't measure the raw disk spindle, which I thought was a given as the test was designed as a kernel-userland interface test?  Or did I misunderstand and there's something more fundamental elsewhere?  I'm curious.


----------



## Deleted (Mar 30, 2014)

tchen said:


> Call me a noob here.  So the timing functions used ioping goes through vsyscall (I assume gettimeofday) on both ends of the heavier aio/block io calls.  Sure, the file system calls get hit with the context switch, but where does the inaccuracy come in?  Are we saying that it doesn't measure the raw disk spindle, which I thought was a given as the test was designed as a kernel-userland interface test?  Or did I misunderstand and there's something more fundamental elsewhere?  I'm curious.


Where does the inaccuracy come from? 

- Gettimeofday, on Linux, is setup with a shared page designed to cache/not perform a full read from hardware (depending on how good your timesources are), which makes it less expensive to call. It's not going to be completely accurate this way, either. Speed != Accuracy. That is why other OS's like FreeBSD have slower gettimeofday()/clock_gettime() because they perform a full lookup directly from the hardware. Mac has commpage, which is similar. This is a pessimistic issue, though (you're talking about ~50uS maybe on VPS's and modern CPUs)

- VPS's do not give direct access to hardware. You need ring0 to access say, the PIT/ACPI, but you don't need either to call rdtsc(). Some/most instructions are emulated, 

- VPS's when they call write() do some funky caching stuff because, again, they do not have direct access to hardware. 

ioping is only useful on dedicated servers, not VPS containers. You'll see wild results and they won't be consistent.


----------



## tchen (Mar 30, 2014)

Monkburger said:


> ioping is only useful on dedicated servers, not VPS containers. You'll see wild results and they won't be consistent.


Thanks for the instruction above.  I do see ioping results fluctuate wildly on VPS but I had always chalked that up to general noise from interleaved access by neighbours.  I guess it wasn't just that.

When put through a time series though, it does have a relatively stable mean distribution.  Enough to work with - as long as you're not that guy on LET who cherry picks the lowest value to start a "It's a fraud" thread


----------



## raindog308 (Mar 30, 2014)

Monkburger said:


> ioping is only useful on dedicated servers, not VPS containers. You'll see wild results and they won't be consistent.


Interesting...though I've see good ioping times from good hosts and bad ioping times from bad hosts (where good/bad means "non/oversubscribed IO").

I never was much for benchmarks in the absolute sense, though.  If I'm solving a problem, having a before/after change benchmark is invaluable, but to just get on a host and run some benchmark.sh...pointless.


----------



## drmike (Mar 31, 2014)

Big question is there much useful that can be ran from within a container to deduce performance baseline whatsoever?

The disk lust in the low end is notorious.

I'd ran the pedestrian things aforementioned and still do... usually when I see laggy or just off nodes...

Time for some of us (not me) to fashion better metrics for container analysis.


----------



## tchen (Mar 31, 2014)

drmike said:


> Big question is there much useful that can be ran from within a container to deduce performance baseline whatsoever?


It kinda depends on what you actually wanted to measure.  Despite our talk about the emulation layers in FS calls, if what you wanted was to measure app-to-metal performance and not raw, it still does that.  It'll just return a large range given

1) Time of day.

2) You're on a shared multi-tenant system.

3) File system/device is cached.

4) IO scheduler is nonlinear.

5) vsyscall may be cached.

I list these factors in descending order of effect.  #1 is diurnal.  #3 is somewhat controllable via the test method and #5 are infinitesimally small compared to #4.  #2-4 have reasonably strong covariances.  Just keep that in mind when parsing the data.  As a preliminary to performance testing

http://msdn.microsoft.com/en-us/library/bb924370.aspx

P.S. the above applies, regardless of virtualization/bare-metal, etc.


----------



## Deleted (Apr 1, 2014)

Running benchmarks within any Virtualized environments is futile. You will never receive consistent results because of emulation (timers, interrupts), non direct access to hardware, CPU cache thrashing because of the excessive amounts of context switching, emulated instructions, etc.

No matter what 'hardware assisted' VXT stuff you enable in the bios, the above still apply.


----------



## drmike (Apr 1, 2014)

Well, there needs to be metrics, measurements, etc. of these environments.  Long been an issue and inconsistent/random results.  Top empty node master metrics are equally as flawed, better somewhat, but indicative merely of empty server potential.

Simply, how does an end user really say much of anything when their container is laggy, disk impossibly slow and CPU non existent?   Saying just that falls into accusation territory and suspect to mass lashings from the fanboys of said company.


----------



## tonyg (Apr 1, 2014)

So according to some people here, VPS benchmarks are meaningless...interesting.

So what do these people do to get a sense of performance?

Maybe they run the actual application?

Well, wouldn't it too be subjected to the same environment as a synthetic benchmark and have periods of highs and lows?

So then why even run a VPS?


----------



## manacit (Apr 1, 2014)

tonyg said:


> So according to some people here, VPS benchmarks are meaningless...interesting.
> 
> So what do these people do to get a sense of performance?
> 
> ...


Sure, they would be subjected to potential periods of low performance, but a good VPS host's low performance should be adequate. The difference between a host worth using and one that isn't is consistently good performance.


----------



## dcdan (Apr 1, 2014)

I would use apache benchmark tool (ab) to benchmark the whole setup. You could do it at certain intervals too, then feed results into mrtg or rrdtool or whatever.

If for some reason I absolutely had to make sure I/O is fast, then I'd probably create a big file (200 GB if possible), and then read a few 4K chunks at random locations of that file every 5 minutes & watch that it does not take over 50-100ms (per chunk) to complete (or 5-10ms per chunk for SSD-based storage).


----------



## manacit (Apr 1, 2014)

Personally, I don't put anything on a LEB that I actually *need* - I only use a RamNode VPS as staging and the rest of my infrastructure is on dedicated servers.

If I were to host things, I'd simply monitor for things like iowait and steal time, and make sure my pages weren't lowing too slow.


----------



## drmike (Apr 1, 2014)

That @manacit, a man after my own heart.

That's how I use VPS instances (not just LowEnd*), but all.   I use them as sandboxes for play.   

When things are production level, dedicated servers or colo units (I prefer these).

Even iowait above is poked at as being bad indicator of much in virtual environment    SSDs and caching of SSDs likely further distort/create weirdness with iowait.   Been seeing that a bit here and there over past few years testing things.  

Steal time, what exactly is that and in theory how does one measure?  Not a term I use regularly.


----------



## tchen (Apr 1, 2014)

Steal time is time the kernel can't account for as user, system or idle.  In a virtual environment, if you have some cpu bandwidth limiter, the kernel will see work being done outside its purview - effectively the hypervisor 'steals' it back.  This becomes the %st.  I've seen it in heavily limited environments like AWS micro instances.  I'm not sure you can even get it under OpenVZ containers, although I haven't looked too closely at it.

The big 1-million dollar question is, is it stealing because the machine is overloaded, or is it something less sinister?  The answer is the later at least in the AWS case.  In order to make sense of %st, you need to look at your actual proc/cpuinfo.  In the EC2 case, the listed CPU clock speeds are way higher than the virtual CPU that's assigned per node.  Because of that, once you reach the allotted work on that vCPU, there's technically still some available work on the real CPU.  The hypervisor needs to reassign the CPU to another VM hence why it appears to be stolen.

The real answer will depend on your provider, but since no one outside of the big-iron bunch ever bother listing what their logical CPU 'core' really is, there's no way to be certain its not overloaded.  The only bright side to it is that since its such a misunderstood term and was horribly called 'steal time', no provider wants to be caught having a consistent %st.


----------



## Deleted (Apr 2, 2014)

The only way to know is from the host node itself, not from the containers; that information is not exposed to the containers (since they run in ring3)

The good thing about VPS is the translation layer for block I/O is cached, so you'll always get decent write speed (and latency) compared to dedicated (because of vfs cache on the host node)

What needs to happen is that loadavg needs to have 2 values, your container's 1/5/15, and the host node's last 1/5/15 averages. Then you can tell if it's a piece of shit or not.


----------



## Thelen (Apr 13, 2014)

SkylarM said:


> If everyone started running a DD on a cronjob every x hours I'd honestly setup monitoring to stop and/or limit the tests. That'd ad huge unnecessary disk activity for no real reason.


+1 there, as a provider 99% of disk IO complaints are FROM dd tests lol.


----------

