# Check if a user is on a vpn / proxy



## black (Jan 24, 2015)

I made a tool that allows admins see how likely an IP address is a proxy/VPN IP. The system returns a probabilistic value of how likely an IP address is a proxy / VPN. 

This should help forum admins, online shops, etc. If you have problems with people bypassing bans / trolls / fraud prevention, this tool should be useful. 

I hear that maxmind doesn't classify VPNs / proxies as high risk so this might be useful as another layer of detection.

The proxy check system uses:


2 static files (manually updated)
1 dynamic file
6 unique dynamic checks
1 cached IPs file (to reduce the number of dynamic check queries)

Here's the full documentation / readme - http://check.getipaddr.net/

The proxy check system has served 175k unique IP lookups in the past ~2 months, and the system is out of the development stage.


----------



## drmike (Jan 24, 2015)

Interesting project @black.
 
One point to expand upon your good work.
 
Unsure if it's a feature, but make the API support NO IP provided, so you are checking the requesting parties own IP address.  Good for those of us wanting to check own IP from time to time to make sure such is "clean" without doing extra step(s) to do self detect of public IP.
 
http://check.getipaddr.net/check.php?ip=

Also expand your project to not check IP but echo their IP as another API flag extension.


----------



## black (Jan 24, 2015)

drmike said:


> Interesting project @black.
> 
> 
> One point to expand upon your good work.
> ...


Thanks for the recommendations. You can visit http://check.getipaddr.net/check.php without giving the parameter "ip" and it'll check your own IP address. As for echoing your own IP address, you can just go to http://getipaddr.net (which is curl and wget friendly)


----------



## devonblzx (Jan 24, 2015)

I'm kind of curious as to how it works.   I know you may not want to release that information.  I only received 0's and 1's on the IPs I tested, it looks like there is supposed to be a score between 0 and 1 for most addresses though.

I'm curious as to how common false positives are.  Does it search the ASN / net block / ports opened, etc?


----------



## black (Jan 24, 2015)

devonblzx said:


> I'm kind of curious as to how it works.   I know you may not want to release that information.  I only received 0's and 1's on the IPs I tested, it looks like there is supposed to be a score between 0 and 1 for most addresses though.
> 
> I'm curious as to how common false positives are.  Does it search the ASN / net block / ports opened, etc?


I have 2 static ban lists. One takes CIDRs and the other takes in AS numbers. The content of these lists are manually added. If an IP is on one of these lists, then they are explicitly banned so the system returns 1. The dynamic file is a list of tor IPs, updated every few hours, which is also explicitly banned and will return a value of 1. If the IP address isn't on any of these lists, it'll go ask a slave node to do a dynamic check on the IP. A dynamic check looks for characteristics of a proxy/VPN IP would have and residential IP wouldn't have. For example if multiple IPs in the /24 are hosting a bunch of websites, then we can say with some probability that this IP is a proxy/VPN. That's just 1 dynamic check. There are 5 more other unique ones the proxy check system uses, each will return a different probability. The slave node will return these 6 probabilistic values, then it'll be modeled as a reliability system in a parallel configuration where ri is the reliability of characteristic [SIZE=12.222222328186px]i[/SIZE]. I don't mind discussing how the system works but I do not wish to disclose all the characteristics I look in a dynamic check for for obvious reasons. You'll get a value of 0 if the characteristics of the IP address doesn't appear to be a proxy / VPN (determined by the proxy check system).

So basically, if the IP isn't explicitly banned, the system will look for characteristics of the IP to determine how likely it is to be a proxy/VPN.


----------



## trewq (Jan 24, 2015)

What does a score of -2 mean?


----------



## black (Jan 24, 2015)

trewq said:


> What does a score of -2 mean?


-2 is Invalid IP address (ipv6 is not supported)


----------



## RTGHM (Jan 24, 2015)

Well, I see it doesn't like my non-proxy IP.

Was picked up as 1 instantly, and my actual proxy was picked up as 0


----------



## black (Jan 24, 2015)

RTGHM said:


> Well, I see it doesn't like my non-proxy IP.
> 
> Was picked up as 1 instantly, and my actual proxy was picked up as 0


Can you PM me your non-proxy IP and if you want, your proxy IP?


----------



## trewq (Jan 24, 2015)

black said:


> -2 is Invalid IP address (ipv6 is not supported)


So why do you have a AAAA record? What's the point of having it accessible over IPv6 if it doesn't work properly?


----------



## black (Jan 24, 2015)

trewq said:


> So why do you have a AAAA record? What's the point of having it accessible over IPv6 if it doesn't work properly?


There's no AAAA record for check.getipaddr.net but I use cloudflare's "automatic IPv6" feature. This allows getipaddr.net work with ipv6.


----------



## trewq (Jan 24, 2015)

black said:


> There's no AAAA record for check.getipaddr.net but I use cloudflare's "automatic IPv6" feature. This allows getipaddr.net work with ipv6.


The record is there... It means if you try and get the score on your own IP and you're running dual stack, it won't work.


----------



## black (Jan 24, 2015)

trewq said:


> The record is there... It means if you try and get the score on your own IP and you're running dual stack, it won't work.


Yeah, the AAAA record is cloudflare's IPs. It's a cloudflare feature. I did not explicitly add an AAAA record for check.getipadr.net and I can't turn it off for a specific subdomain.

Are you using some sort of NAT VPS with IPv6 addresses? In Debian you can set prefer IPv4 by editing 


/etc/gai.conf
and add the following line


```
precedence ::ffff:0:0/96  100
```


----------



## trewq (Jan 24, 2015)

black said:


> Yeah, the AAAA record is cloudflare's IPs. It's a cloudflare feature. I did not explicitly add an AAAA record for check.getipadr.net and I can't turn it off for a specific subdomain.
> 
> 
> Are you using some sort of NAT VPS with IPv6 addresses? In Debian you can set prefer IPv4 by editing
> ...


Ah ok, thought you could in cloudflare. Nope, native on home connection.


----------



## black (Jan 24, 2015)

trewq said:


> Ah ok, thought you could in cloudflare. Nope, native on home connection.


Ah ok. Well if you're using any Debian based linux distro, changing gai.conf should use your ipv4 address. I'm sure it's something similar in other *nix flavors. I'm not too sure about windows.


----------



## trewq (Jan 24, 2015)

black said:


> Ah ok. Well if you're using any Debian based linux distro, changing gai.conf should use your ipv4 address. I'm sure it's something similar in other *nix flavors. I'm not too sure about windows.


This was actually on Android. It's not an issue anymore, just thought it seemed silly to have it IPv6 accessible when it doesn't work with IPv6.


----------



## black (Jan 24, 2015)

trewq said:


> This was actually on Android. It's not an issue anymore, just thought it seemed silly to have it IPv6 accessible when it doesn't work with IPv6.


Yeah, I had to do it the lame way (throw an error).


----------



## KuJoe (Jan 25, 2015)

For those who want to integrate this into a PHP script, here's the code I use for my control panel:


if (filter_var($ip, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6) != true) {
$proxychk = file_get_contents("http://check.getipaddr.net/check.php?ip=".$ip."");
} else {
$proxychk = '-2';
}
echo $proxychk;
Works nicely with FraudRecord for quick screening without having to spend any money.


----------



## Kayaba Akihiko (Jan 25, 2015)

Just wondering, is this any more effective than just banning every datacenter IP?

https://github.com/Zalvie/nginx_block_files


----------



## William (Jan 25, 2015)

Gives me 0.55 for my non-proxy/VPN external 4G IP.


----------



## rds100 (Jan 25, 2015)

Well, it gives 1 for all our busyness IPs (which are not proxies / VPNs), so... take the results with a grain of salt.


----------



## lbft (Jan 25, 2015)

0.8425 for my residential connection. Clearly this needs some work.


----------



## KuJoe (Jan 25, 2015)

lbft said:


> 0.8425 for my residential connection. Clearly this needs some work.


It checks the IPs around you in your /24 (I think it's /24 at least) and if any of them are running proxies/VPNs/webservers then it affects your score. It's great for new IP blocks that are picked up by hosting providers that haven't been tagged as such yet by databases.


----------



## RTGHM (Jan 25, 2015)

KuJoe said:


> For those who want to integrate this into a PHP script, here's the code I use for my control panel:
> 
> 
> if (filter_var($ip, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6) != true) {
> ...


_file_get_contents_


----------



## KuJoe (Jan 25, 2015)

RTGHM said:


> _file_get_contents_


If you know a better method I'm all ears. I'm not a software developer by any stretch of the imagination (ask vld, he did a complete audit of Wyvern). If you're implying about the potential security issues, then I would recommend adding more validation and sanitation to the script (I left it out as I was just providing the code to get the score).


----------



## black (Jan 25, 2015)

Kayaba Akihiko said:


> Just wondering, is this any more effective than just banning every datacenter IP?
> 
> https://github.com/Zalvie/nginx_block_files


Cakey and I swap AS lists. I think at this point, if he's using my list as well, it's a lot bigger than the one published on github.

This system has

- a lot more ASNs banned than what's published on github.

- it has fine grained control for ASNs that offer both residential and server hosting (by using CIDR bans instead of ASN bans).

- It's able to infer if an IP is/isn't a proxy if it's not statically banned.

For these reasons, I think it's better.



William said:


> Gives me 0.55 for my non-proxy/VPN external 4G IP.


Yeah, 0.55 isn't something to worry about. The server is saying I'm 55% sure this is a proxy, which is like guessing a coin toss. For administrators that use this service, I recommend them flagging a user (not explicitly banning them) for values > 0.75 (at the minimum).


----------



## black (Feb 25, 2015)

rds100 said:


> Well, it gives 1 for all our busyness IPs (which are not proxies / VPNs), so... take the results with a grain of salt.


Sorry I just saw this. Can you give me some of the IPs in question? I'll look into it.


----------



## fixidixi (Feb 25, 2015)

@black:

well I've also got 0.55 :|


----------



## libro22 (Feb 25, 2015)

black said:


> Thanks for the recommendations. You can visit http://check.getipaddr.net/check.php without giving the parameter "ip" and it'll check your own IP address. As for echoing your own IP address, you can just go to http://getipaddr.net (which is curl and wget friendly)


Nice project! Is it possible to use getipaddr.net to check info (from more page) of other IPs?


----------



## rds100 (Feb 25, 2015)

black said:


> Sorry I just saw this. Can you give me some of the IPs in question? I'll look into it.


Just check AS16154. Of all the prefixes from this AS only several /24s are for VPS and dedicated server customers.


----------



## black (Feb 25, 2015)

rds100 said:


> Just check AS16154. Of all the prefixes from this AS only several /24s are for VPS and dedicated server customers.


My apologies. This AS must've been banned when I first started the project and didn't have fine grained control as well as the dynamic checks. All IP blocks were removed (except for a few) and it's no longer a banned AS.



fixidixi said:


> @black:
> 
> well I've also got 0.55 :|


Nothing to worry about. 55% isn't much better than a coin flip toss (probabilistic speaking).



libro22 said:


> Nice project! Is it possible to use getipaddr.net to check info (from more page) of other IPs?


Not at this time. getipaddr.net is made to query your own IP address. I don't plan to expand it further unless there's some serious demand.

----------------

From Feb 1st to Feb 25 (now), the proxy check system has served ~950k queries. No one has really contacted me about any corrections (except for rds100, which I have corrected), so I think that implies things are working pretty well. Moving forward, please let me know any issues you have.

Thanks.


----------



## fixidixi (Feb 25, 2015)

@black:

May I ask what kind of setup/node served that 950k query?


----------



## black (Feb 25, 2015)

fixidixi said:


> @black:
> 
> May I ask what kind of setup/node served that 950k query?


1 master (main http server), 6 slave nodes ( for dynamic checks). If I continue to develop this project, I'll make the master server semi-redundant / distributed as well.


----------



## black (Apr 30, 2015)

Some updates: I'm working on a new version. Use check.dynamic.php instead of check.php

Dynamic checks are faster and there's more of them. I've added detection for 'bad agents' like spammers as well. 

There's 9 dynamic checks on the beta version. 

The backend slave servers are multi-threaded.

 

The old system got an update as well where everything is running in ram. I've adjusted some values because one attribute was being too heavy handed.


----------



## GIANT_CRAB (May 1, 2015)

Still giving a 1 for my residential IP. Not sure how this can even be relied on...


----------



## black (May 1, 2015)

GIANT_CRAB said:


> Still giving a 1 for my residential IP. Not sure how this can even be relied on...


Can you PM me your IP within the /24? Thanks.


----------



## black (Jul 12, 2015)

There's been some major improvements made so I thought I'd let people know. Firstly, I've curated my own datasets which is about 40 GB in size that I maintain on a daily basis. This means that queries take around 150 - 300 ms instead of 3 secs to 11 secs on previous versions. I've upped the query limit from 40 to 80 queries a minute. There are 15+ unique dynamic checks at this point, compared to < 10 in previous versions. When it comes to boosting in machine learning, the more weak classifiers there are (in this case, more dynamic checks), the better the result.

 

 

As always, this is 100% free. If you're having issues with bots scanning your application, crawlers, fraudsters, trolls, people trying to ban evade, etc, try it out.


----------

