amuck-landowner

HE BGP Toolkit scraper

D. Strout

Resident IPv6 Proponent
Since bgp.he.net has no posted terms of service disallowing it, I wrote a script that, given an ASN, scrapes the site to find number of IPs and list of prefixes with descriptions.


<?php
$asn = trim(ltrim($argv[1], "AS"));
if (!is_numeric($asn)) exit("Invalid ASN!\n");

$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "http://bgp.he.net/AS$asn");
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"); //Scraping with cURL or wget UAs causes 403. UA spoofing here.
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 2); //Don't take too long
curl_setopt($curl, CURLOPT_TIMEOUT, 60);

$asPage = curl_exec($curl);
if (strpos($asPage, "did not return any results")) exit("AS not found!\n"); //No results!
$ips = explode("IPs Originated (v4): ", $asPage)[1];
$ips = str_replace(",", "", explode("<", $ips)[0]); //Parse # of IPs

$prefixes = array();
$prefixHTML = array_splice(explode("/net/", $asPage), 1); //See source of any ASN page to understand this
foreach ($prefixHTML as $prefix) {
$prefixDesc = explode("<td>", $prefix)[1]; //Get description first
$prefixDesc = trim(explode("<div", $prefixDesc)[0]);
$prefix = explode('"', $prefix)[0]; //Since prefix is easier
$prefixes[] = "$prefix - $prefixDesc";
}

if ($ips > 0) echo "AS$asn has $ips IPs. The full list of prefixes has been saved to the file `prefixes`.\n"; //Output
else exit("No IPs in this ASN.\n"); //Amazing how many empty ASNs there are out there. See for instance AS43297.

file_put_contents("prefixes", implode("\n", $prefixes)); //Output full list to file since it's often pretty long.
?>

A little messy, and obviously any use is at your own risk. Careful with that output - it's up to you to make sure it doesn't wipe out anything important. This is written to be run from the terminal. Make sure php5-cli (v 5.4+) and php5-curl are installed, then run with:

Code:
./bgp.php 1234
#...OR...
./bgp.php AS1234
 
Last edited by a moderator:

D. Strout

Resident IPv6 Proponent
If you just want to output prefixes straight to terminal, change the last part (starting from foreach) to this:

Code:
foreach ($prefixHTML as $prefix) {
        $prefixDesc = explode("<td>", $prefix)[1]; //Get description first
        $prefixDesc = trim(explode("<div", $prefixDesc)[0]);
        $prefix = explode('"', $prefix)[0]; //Since prefix is easier
        echo "$prefix //$prefixDesc";
}
?>
 
Last edited by a moderator:

trewq

Active Member
Verified Provider
You'll get banned. I managed to get banned once just from normal browser usage.
 

D. Strout

Resident IPv6 Proponent
Ah well, we'll see. If I cared enough, I could set this up on several servers, each one fetching at a different interval (plus or minus a few minutes, randomly), pulling from a pool of user agent strings to spoof with. That would probably work. I'm just surprised no one provides an API to access this stuff in a less hack-ish way.
 

splitice

Just a little bit crazy...
Verified Provider
You can retrieve most (if not all) of the information on bgp.he.net using public route servers, whois and a bunch of other methods. 
 
Last edited by a moderator:

Wintereise

New Member
You can retrieve most (if not all) of the information on bgp.he.net using public route servers, whois and a bunch of other methods. 
All, actually.

If you have a full BGP feed from anyone, you can also use that. IRR data from radb/arin/nttcom/savvis/altdb, rest from public whois dbs.
 
Last edited by a moderator:

splitice

Just a little bit crazy...
Verified Provider
The most is in relation to public route servers. Some have restricted commands.
 

Wintereise

New Member
The most is in relation to public route servers. Some have restricted commands.
Not quite, anything that's actually worthy of being called a route server will allow you to view the table with a match/include <ASN_HERE>.

That's all the data you need, any view will work for this -- so if one doesn't work, just move on to another, etc.
 

D. Strout

Resident IPv6 Proponent
Not quite, anything that's actually worthy of being called a route server will allow you to view the table with a match/include <ASN_HERE>.

That's all the data you need, any view will work for this -- so if one doesn't work, just move on to another, etc.
First I've used public route servers. Googled around but didn't find anything that works - what would be the command to list an ASN's IP prefixes?
 

splitice

Just a little bit crazy...
Verified Provider
At a glance,

arin|US|ipv4|204.80.139.0|256|19950110|assigned|d6cabb719e071d078b40b4558f8b8039
 RIR|Country|Type|Start|IPs|Date Updated?|Status|Something?
 
Last edited by a moderator:

splitice

Just a little bit crazy...
Verified Provider
As I said, at a glance. It sounds like you want someone to do it for you. Thats just a list of ASN's / IP allocations and their details. Its not a mapping.

From an IP block obtaining origin ASN? Use show BGP. Remember its entirely possible to announce a set of IPs on multiple AS's. The same as its possible to have a set of IPs not announced via any AS (like the ones I founds at a glance).

Code:
show ip bgp 206.208.112.0

BGP routing table entry for 206.208.112.0/21
Bestpath Modifiers: deterministic-med
Paths: (14 available, best #6)
Multipath: eBGP
  3356 32408
    pvu-tcore1. (metric 13067) from pye-core1. (pye-core1.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.90
  3356 32408
    pvu-tcore1. (metric 13067) from pvu-thar1. (66.110.10.224)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.90
  3356 32408
    ldn-tcore1. (metric 13057) from l78-mcore3. (Loopback5.mcore3.L78-London.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.38
  3356 32408
    fr0-tcore1. (metric 13075) from fr1-thar1. (66.110.11.100)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.66
  3356 32408
    mln-tcore1. (metric 10463) from mln-tcore1. (66.110.11.81)
      Origin IGP, valid, internal
      Community: 
  3356 32408
    ct8-tcore1. (metric 10010) from ct8-tcore1. (66.110.11.16)
      Origin IGP, valid, internal, best
      Community: 
  3549 32408
    dt8-tcore2. (metric 10100) from dtx-core1. (dtx-core1.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.11.15
  3549 32408
    nto-tcore1. (metric 10052) from nyy-mcore4. (nyy-mcore4.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.11.84
  3549 32408
    nto-tcore1. (metric 10052) from nto-tcore1. (66.110.11.84)
      Origin IGP, valid, internal
      Community: 
  3549 32408
    aeq-tcore2. (metric 10072) from aeq-thar1. (66.110.10.83)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.252
  3549 32408
    aeq-tcore2. (metric 10072) from aeq-tcore2. (66.110.10.252)
      Origin IGP, valid, internal
      Community: 
  3549 32408
    lvw-tcore2. (metric 10094) from laa-mcore3. (laa-mcore3.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.248
  3549 32408
    ct8-tcore2. (metric 10010) from ct8-tcore2. (66.110.11.17)
      Origin IGP, valid, internal
      Community: 
  3549 32408
    pdi-tcore2. (metric 10077) from pdi-mcore4. (pdi-mcore4.)
      Origin IGP, valid, internal
      Community: 
      Originator: 66.110.10.114
Or using a third party service (Team CYMRU)
Code:
$ whois -h asn.shadowserver.org "origin 206.208.112.0"
32408 | 206.208.112.0/21 | SMHCOLOCATION | US | ADVANCED-INTERNET-CONSULTING.COM | ADVANCED INTERNET CONSULTING
There is even a bulk data and DNS based interface.
 

rmlhhd

Active Member
Verified Provider
Nice script but as said above banning is imminent with that about of requests.
 

D. Strout

Resident IPv6 Proponent
Nice script but as said above banning is imminent with that about of requests.
While testing it I probably made one or two dozen requests without getting banned. Now that's it's up and working and providing data for whosspamming.us, I only re-run it once a day (though if I'm notified that is out of date, I can get in and run it in 5 minutes). I'd be surprised if I got banned, and if I do I'll just switch servers and user agents. No problem.

Not really sure why HE isn't a fan of this type of usage, it's very "lightweight" in terms of resources necessary to service the request.
 
Last edited by a moderator:
Top
amuck-landowner