[C++] extreme fast counting of files in directory

William · Mar 12, 2014

I used this before for a dir with ~4million files, this takes around 0.5s while ls takes... well... no idea, it never finishes.

compile with "g++ count.cpp -o count && mv count /usr/sbin"

Usage: ./count "<path>"

Code:

#include <stdio.h>
#include <dirent.h>

int main(int argc, char *argv[])
{
    if(argc != 2)
    {
        printf("Usage: ./count \"<path>\"\n");
        return 1;
    }

    struct dirent *de;
    DIR *dir = opendir(argv[1]);
    if(!dir)
    {
        printf("opendir() failed! Does it exist?\n");
        return 1;
    }

    unsigned long count=0;
        while(de = readdir(dir))
     {
          ++count;
     }

    closedir(dir);
    printf("%lu\n", count);

    return 0;
}

fixidixi · Mar 12, 2014

just caught my eyes: count is unsigned long. thats 0 to 4 294 967 295 are there more than that much files?

strace it?

peterw · Mar 12, 2014

I can wait 0.5 seconds: ls -a | wc -l

HostUS-Alexander · Mar 12, 2014

Nice share.

- Alexander

William · Mar 12, 2014

peterw said:
I can wait 0.5 seconds: ls -a | wc -l

Yea, for your 100 files maybe

root@db:/# time count /home/db/

4183976

real 0m0.254s

root@db:/# time ls /home/db/ | wc -l

4183974

real 0m14.002s

rds100 · Mar 12, 2014

Any programmer who decides it's OK to store several million files in a single directory shouldn't be allowed to touch a computer again.

.

Francisco · Mar 12, 2014

rds100 said:
Any programmer who decides it's OK to store several million files in a single directory shouldn't be allowed to touch a computer again.

.

You know, doing our backups node has been a real test of that very comment.

You'd be simply amazed how many million+ inode qmail queue folders we have on here.

Francisco

fixidixi · Mar 12, 2014

@William care to tell us if you could solve this?

William · Mar 12, 2014

rds100 said:
Any programmer who decides it's OK to store several million files in a single directory shouldn't be allowed to touch a computer again.

.

Well, sort of, it is zero IO concern for me - I don't count them often and *know* the filename of each file i need to copy/read. (stored in a DB). It also saves me steps of cutting characters and using subdirs.

fixidixi said:
@William care to tell us if you could solve this?

No, i didn't write it - I got it including copyleft from some friend.

raindog308 · Mar 12, 2014

#!/usr/bin/perl
opendir (D,$ARGV[0]) || die;
while ($file=readdir(D)) { $count++; }
print $count . "\n";

Code:

$ ./count.pl /tmp
146

There are other ways of doing this...see http://www.perlmonks.org/?node_id=606766

Technically, there are some limitations (which are also present in the C++ code):

You'll probably count '.' and '..'
You're not descending recursively, nor testing that the directory entry you're counting is a file, directory, link, pipe, etc.

I'd be curious to know what the speed difference is between perl and C. I don't have a directory with 4 million files laying around though.

BTW, your code is technically C++ because C++ is a superset of C, but it's really just C.

raindog308 · Mar 12, 2014

fixidixi said:
just caught my eyes: count is unsigned long. thats 0 to 4 294 967 295 are there more than that much files?

strace it?

unsigned long is platform dependent. It could be as small as 16-bit.

long long, on the other hand, is guaranteed to be 64 bits on all platforms. However, the only guaranteed range is -2^31-1 to 2^31-1. Stupid.

If you know that you are running on a 64-bit platform, then unsigned long will get you to 18,446,744,073,709,551,615. Even @Francisco doesn't have 18 quintillion files in one directory. Though if he does, I bet they're all animated .GIFs and viewing that directory in Windows Explorer would cause a singularity of some sort.

http://en.wikipedia.org/wiki/Long_integer#Common_long_integer_sizes

William · Mar 12, 2014

I tested it quickly:

Your code takes a medium of 0m0.728s (out of 10 tests with variance of up to .020)

The C++ takes a medium of 0m0.254s (out of 10 tests with a variance of up to .003, extremely stable result)

If you know that you are running on a 64-bit platform, then unsigned long will get you to 18,446,744,073,709,551,615.

Good, i only use 64Bit anyway

qrwteyrutiyoup · Mar 12, 2014

To be on the safe side, you might want to

#include <stdint.h>
and use

Code:

uint64_t

Wintereise · Mar 12, 2014

qrwteyrutiyoup said:
To be on the safe side, you might want to

#include <stdint.h>
and use

uint64_t

+1.

raindog308 · Mar 12, 2014

William said:
I tested it quickly:

Your code takes a medium of 0m0.728s (out of 10 tests with variance of up to .020)

The C++ takes a medium of 0m0.254s (out of 10 tests with a variance of up to .003, extremely stable result)

That is about what I'd expect. A lot of that .5 is probably the perl interpreter start up. I suspect if that count was run multiple times in the same script, subsequent calls would be faster.

kaniini · Mar 14, 2014

qrwteyrutiyoup said:
To be on the safe side, you might want to

#include <stdint.h>
and use

uint64_t

Actually you should actually use size_t per C99, size_t is meant to be the widest type intended for widths of allocations (in multiples of 1 or more). So, size_t should be used.

qrwteyrutiyoup · Mar 15, 2014

kaniini said:
Actually you should actually use size_t per C99, size_t is meant to be the widest type intended for widths of allocations (in multiples of 1 or more). So, size_t should be used.

Good point. For the purpose of this program size_t is better suited, even if one cannot tell the size of the variable without knowing the architecture it's going to run.

[C++] extreme fast counting of files in directory

William

pr0

fixidixi

Active Member

peterw

New Member

HostUS-Alexander

Active Member

William

pr0

rds100

New Member

Francisco

Company Lube

fixidixi

Active Member

William

pr0

raindog308

vpsBoard Premium Member

raindog308

vpsBoard Premium Member

William

pr0

qrwteyrutiyoup

Member

Wintereise

New Member

raindog308

vpsBoard Premium Member

kaniini

Beware the bunny-rabbit!

qrwteyrutiyoup

Member