# HE BGP Toolkit scraper



## D. Strout (Jul 31, 2014)

Since bgp.he.net has no posted terms of service disallowing it, I wrote a script that, given an ASN, scrapes the site to find number of IPs and list of prefixes with descriptions.


<?php
$asn = trim(ltrim($argv[1], "AS"));
if (!is_numeric($asn)) exit("Invalid ASN!\n");

$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "http://bgp.he.net/AS$asn");
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"); //Scraping with cURL or wget UAs causes 403. UA spoofing here.
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 2); //Don't take too long
curl_setopt($curl, CURLOPT_TIMEOUT, 60);

$asPage = curl_exec($curl);
if (strpos($asPage, "did not return any results")) exit("AS not found!\n"); //No results!
$ips = explode("IPs Originated (v4): ", $asPage)[1];
$ips = str_replace(",", "", explode("<", $ips)[0]); //Parse # of IPs

$prefixes = array();
$prefixHTML = array_splice(explode("/net/", $asPage), 1); //See source of any ASN page to understand this
foreach ($prefixHTML as $prefix) {
$prefixDesc = explode("<td>", $prefix)[1]; //Get description first
$prefixDesc = trim(explode("<div", $prefixDesc)[0]);
$prefix = explode('"', $prefix)[0]; //Since prefix is easier
$prefixes[] = "$prefix - $prefixDesc";
}

if ($ips > 0) echo "AS$asn has $ips IPs. The full list of prefixes has been saved to the file `prefixes`.\n"; //Output
else exit("No IPs in this ASN.\n"); //Amazing how many empty ASNs there are out there. See for instance AS43297.

file_put_contents("prefixes", implode("\n", $prefixes)); //Output full list to file since it's often pretty long.
?>

A little messy, and obviously any use is at your own risk. Careful with that output - it's up to you to make sure it doesn't wipe out anything important. This is written to be run from the terminal. Make sure php5-cli (v 5.4+) and php5-curl are installed, then run with:


```
./bgp.php 1234
#...OR...
./bgp.php AS1234
```


----------



## D. Strout (Jul 31, 2014)

If you just want to output prefixes straight to terminal, change the last part (starting from foreach) to this:


```
foreach ($prefixHTML as $prefix) {
        $prefixDesc = explode("<td>", $prefix)[1]; //Get description first
        $prefixDesc = trim(explode("<div", $prefixDesc)[0]);
        $prefix = explode('"', $prefix)[0]; //Since prefix is easier
        echo "$prefix //$prefixDesc";
}
?>
```


----------



## Wintereise (Jul 31, 2014)

It'll ban you for overusage as soon as you actually put this to any actual use.


----------



## trewq (Jul 31, 2014)

You'll get banned. I managed to get banned once just from normal browser usage.


----------



## D. Strout (Jul 31, 2014)

Ah well, we'll see. If I cared enough, I could set this up on several servers, each one fetching at a different interval (plus or minus a few minutes, randomly), pulling from a pool of user agent strings to spoof with. That would probably work. I'm just surprised no one provides an API to access this stuff in a less hack-ish way.


----------



## splitice (Jul 31, 2014)

You can retrieve most (if not all) of the information on bgp.he.net using public route servers, whois and a bunch of other methods.


----------



## Wintereise (Jul 31, 2014)

splitice said:


> You can retrieve most (if not all) of the information on bgp.he.net using public route servers, whois and a bunch of other methods.


All, actually.

If you have a full BGP feed from anyone, you can also use that. IRR data from radb/arin/nttcom/savvis/altdb, rest from public whois dbs.


----------



## splitice (Jul 31, 2014)

The most is in relation to *public *route servers. Some have restricted commands.


----------



## Wintereise (Jul 31, 2014)

splitice said:


> The most is in relation to *public *route servers. Some have restricted commands.


Not quite, anything that's actually worthy of being called a route server will allow you to view the table with a match/include <ASN_HERE>.

That's all the data you need, any view will work for this -- so if one doesn't work, just move on to another, etc.


----------



## D. Strout (Jul 31, 2014)

Wintereise said:


> Not quite, anything that's actually worthy of being called a route server will allow you to view the table with a match/include <ASN_HERE>.
> 
> That's all the data you need, any view will work for this -- so if one doesn't work, just move on to another, etc.


First I've used public route servers. Googled around but didn't find anything that works - what would be the command to list an ASN's IP prefixes?


----------



## splitice (Jul 31, 2014)

See:


ftp.arin.net/pub/stats/arin/delegated-arin-extended-latest

ftp.ripe.net/ripe/stats/delegated-ripencc-latest
ftp.afrinic.net/pub/stats/afrinic/delegated-afrinic-latest
ftp.apnic.net/pub/stats/apnic/delegated-apnic-latest
ftp.lacnic.net/pub/stats/lacnic/delegated-lacnic-latest


----------



## D. Strout (Jul 31, 2014)

splitice said:


> See:
> 
> 
> ftp.arin.net/pub/stats/arin/delegated-arin-extended-latest
> ...


I don't understand - what are those lists saying? I did a search in the ARIN one for 36352 and didn't come up with anything useful.


----------



## splitice (Jul 31, 2014)

At a glance,



> arin|US|ipv4|204.80.139.0|256|19950110|assigned|d6cabb719e071d078b40b4558f8b8039


 RIR|Country|Type|Start|IPs|Date Updated?|Status|Something?


----------



## D. Strout (Jul 31, 2014)

But I need a list that includes ASNs! That's just a list of IP blocks - I don't see any correlation to ASes. In fact, if I take the example you linked and plug it in to bgp.he.net, it says no results.


----------



## splitice (Jul 31, 2014)

As I said, at a glance. It sounds like you want someone to do it for you. Thats just a list of ASN's / IP allocations and their details. Its not a mapping.

From an IP block obtaining origin ASN? Use show BGP. Remember its entirely possible to announce a set of IPs on multiple AS's. The same as its possible to have a set of IPs not announced via any AS (like the ones I founds at a glance).


```
show ip bgp 206.208.112.0

BGP routing table entry for 206.208.112.0/21
Bestpath Modifiers: deterministic-med
Paths: (14 available, best #6)
Multipath: eBGP
  3356 32408
    pvu-tcore1. (metric 13067) from pye-core1. (pye-core1.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.90
  3356 32408
    pvu-tcore1. (metric 13067) from pvu-thar1. (66.110.10.224)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.90
  3356 32408
    ldn-tcore1. (metric 13057) from l78-mcore3. (Loopback5.mcore3.L78-London.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.38
  3356 32408
    fr0-tcore1. (metric 13075) from fr1-thar1. (66.110.11.100)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.66
  3356 32408
    mln-tcore1. (metric 10463) from mln-tcore1. (66.110.11.81)
      Origin IGP, valid, internal
      Community: 
  3356 32408
    ct8-tcore1. (metric 10010) from ct8-tcore1. (66.110.11.16)
      Origin IGP, valid, internal, best
      Community: 
  3549 32408
    dt8-tcore2. (metric 10100) from dtx-core1. (dtx-core1.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.11.15
  3549 32408
    nto-tcore1. (metric 10052) from nyy-mcore4. (nyy-mcore4.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.11.84
  3549 32408
    nto-tcore1. (metric 10052) from nto-tcore1. (66.110.11.84)
      Origin IGP, valid, internal
      Community: 
  3549 32408
    aeq-tcore2. (metric 10072) from aeq-thar1. (66.110.10.83)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.252
  3549 32408
    aeq-tcore2. (metric 10072) from aeq-tcore2. (66.110.10.252)
      Origin IGP, valid, internal
      Community: 
  3549 32408
    lvw-tcore2. (metric 10094) from laa-mcore3. (laa-mcore3.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.248
  3549 32408
    ct8-tcore2. (metric 10010) from ct8-tcore2. (66.110.11.17)
      Origin IGP, valid, internal
      Community: 
  3549 32408
    pdi-tcore2. (metric 10077) from pdi-mcore4. (pdi-mcore4.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.114
```
Or using a third party service (Team CYMRU)

```
$ whois -h asn.shadowserver.org "origin 206.208.112.0"
32408 | 206.208.112.0/21 | SMHCOLOCATION | US | ADVANCED-INTERNET-CONSULTING.COM | ADVANCED INTERNET CONSULTING
```
There is even a bulk data and DNS based interface.


----------



## Wintereise (Jul 31, 2014)

Instead of doing all that fuckery, here, have a proper API -- https://stat.ripe.net/data/announced-prefixes/data.json?preferred_version=1.1&resource=AS36352

Courtesy of the RIPE labs.


----------



## yomero (Jul 31, 2014)

Exactly what I was looking for yesterday to block a whole AS. Thanks everybody.


----------



## D. Strout (Jul 31, 2014)

Wintereise said:


> Instead of doing all that fuckery, here, have a proper API -- https://stat.ripe.net/data/announced-prefixes/data.json?preferred_version=1.1&resource=AS36352
> 
> Courtesy of the RIPE labs.


Now this, I can use.


----------



## rmlhhd (Jul 31, 2014)

Nice script but as said above banning is imminent with that about of requests.


----------



## D. Strout (Jul 31, 2014)

Servaman said:


> Nice script but as said above banning is imminent with that about of requests.


While testing it I probably made one or two dozen requests without getting banned. Now that's it's up and working and providing data for whosspamming.us, I only re-run it once a day (though if I'm notified that is out of date, I can get in and run it in 5 minutes). I'd be surprised if I got banned, and if I do I'll just switch servers and user agents. No problem.

Not really sure why HE isn't a fan of this type of usage, it's very "lightweight" in terms of resources necessary to service the request.


----------



## dcdan (Jul 31, 2014)

http://whosspamming.us/list.php?provider=all&list=all

How do we (IT7 Networks) get "delisted"? Two records show our prefixes announced for us by B2 as we still have a few boxes with them.


----------



## HalfEatenPie (Jul 31, 2014)

dcdan said:


> http://whosspamming.us/list.php?provider=all&list=all
> 
> How do we (IT7 Networks) get "delisted"? Two records show our prefixes announced for us by B2 as we still have a few boxes with them.


Howdy there! It's probably best for you to move your question to here: 

Also, talk with your upstream provider to get you delisted.


----------



## Kris (Jul 31, 2014)

Server load dropped from a constant .85 - 1 to around : .08 - 25 after adding these. No need to query RBLs, waste I/O, etc.

Could be co-incidental, but the client's box has almost no spam coming in now except for every few hours, not ever few minutes. 

*Thanks D. Strout, I wish I could give gold for this... Err that's Reddit. *


----------



## Kris (Jul 31, 2014)

Hey D-Strout, could you maybe once every 12 hours echo out the raw CC + SM output into a text file for use in firewalls ? 

 

*Would drop into APF firewall very easily, and fetch them all on restart and auto-update new blocks. *

 

 

##

# Global Trust

##

# This is an implementation of the trust rules (allow/deny_hosts) but

# on a global perspective. You can define below remote addresses from

# which the glob_allow/deny.rules files should be downloaded from on

# a daily basis. The files can be maintained in a static fashion by

# leaving USE_RGT=0, ideal for a host serving the files.

USE_RGT="0"

 

GA_URL="yourhost.com/glob_allow.rules"

GA_URL_PROT="http"

 

GD_URL="yourhost.com/glob_deny.rules"

GD_URL_PROT="http"


----------



## gbshouse (Sep 12, 2014)

Try http://www.team-cymru.org/Services/ip-to-asn.html or http://www.cidr-report.org/as2.0/

The bgp.he.net is not the best or enough accurate source


----------

