wlanboy.com - how it started

wlanboy

Content Contributer
So this is the short story about wlanboy.com - a twitter archive dedicated to vps providers.


Twitter and all other social services are banned on work so I needed a remote twitter client.


Main reason was about missing campaigns.


Yup - I missed some good coupon codes and therefore wanted to build a tool that is searching for them.


And - of course - one that updates me on any deals.


Next feature was about igonring dinky tweets (one without likes, retweets,etc) - and to find new twitter accounts that might be interesting (followers/following/rt).


After some time I added lists to manage the stored profiles and tweets.


A list is a group of twitter accounts that are sharing the same topic.


Like:

  • .Net
  • Java
  • VPS providers
  • Redis / MongoDB
  • Ruby / Gems
  • etc

A profile stands for an twitter user. It does include information about how often he/she writes something to ensure that the workers are not polling profiles that do not have any new information. And about the quality of the tweets. So a profile can be deactivated if someone is only tweeting about his dog.


The system itself is build out of following components:

  • A Ruby thin app that is providing the webpage
  • A MongoDB cluster holding all data
  • A RabbitMQ cluster which is load balancing and distributing workloads
  • A cronjob that is creatinig workloads
  • A bunch of workers that are listening on RabbitMQ queues for work

A simple cronjob (Ruby script) is looping through all profiles to check if a twitter account should be updated.


It is creating a workload item (including the latest tweet id) and sends it to a RabbitMQ topic.


One of the workers is fetching the work order, is scraping the twitter profile, is storing all new tweets and is updating the profile.


Currently only the "VPS provider" list is public and can be accessed through wlanboy.com.


Counting through all lists (today) about 3700 twitter profiles are stored in the database.


The "VPS provider" list does include 37 active profiles (and about 20 disabled ones).


After the homepage was finished I added some additional services:

  • full text search with and without profile name
  • RSS feed for all new added feeds
  • Statistics about the number of tweets per day

Yesterday I switched the whole domain to SSL only.


It was about time todo that after the switch of vpsboard.


Was quite a hassle to switch everything to local files to ensure that every file is based on my own domain (even the fonts referenced in css).


Today I will start to publish my tutorials on my homepage too (for the ones who want them without the IP.Board html bloat).


For me it is exciting to see what can be made out of a simple terminal script - if you don't stop after the second month.
 
Last edited by a moderator:

peterw

New Member
Nice insight. Hopefully others will write about their motives and the structure of their projects too. I think this post is a invitation to ask questions. I want to start with some :)

Question 1: How do you parse the websites? What libs do you use?

Question 2: Why do you not make money with all the information and tutorials?

Question 3: Why do you choosed Mongodb? Hype?
 

wlanboy

Content Contributer
Question 1: How do you parse the websites? What libs do you use?

Question 2: Why do you not make money with all the information and tutorials?

Question 3: Why do you choosed Mongodb? Hype?
  1. HTML is XML so I am using xpath to pick the elements I want.
    Need a tool to get xpaths out of a current website?


    Use the Firefox Plugin Firebug.


    On the HTML view you can select any element and right click on it to copy the xpath for this element.
  2. Most of my knowledge is based on free resources.
    Public libraries and free online stuff. People like me writing tutorials about things they manage to run.


    I don't invent groundbreaking stuff - I only editing and enhancing allready known knowledge.


    Maybe making it (hopefully) more accessible to beginners.


    It would be simply not ok to make money with that.
  3. No hype - it just worked for my interface. I decided to use json for all communication.
    It is easier to add stuff.


    It does have some drawbacks but the advantages outbalance them.
 

peterw

New Member
Thank you for the hint to use xpath.

People who copy others work or draw some free vps to get people writing for them should read #2 two times. I hate the people with their ad polluted howto websites with outdated and never testet content.
 

mikho

Not to be taken seriously, ever!
People who copy others work or draw some free vps to get people writing for them should read #2 two times. I hate the people with their ad polluted howto websites with outdated and never testet content.
Was that a subtle remark towards me?
 

peterw

New Member
Was that a subtle remark towards me?
It was my anger about a lot of howto websites.Not every point applies to you. You do not steal content.

But you are selling ads

 
Last edited by a moderator:

mikho

Not to be taken seriously, ever!
It was my anger about a lot of howto websites.Not every point applies to you. You do not steal content.


But you are selling ads

And so are vpsBoard, I don't see the difference?
 
Last edited by a moderator:

wlanboy

Content Contributer
But you are selling ads
Please calm down.

Mikho is collecting tutorials and naming each author - so I do not see anything bad here.

His website is good and his Wordpress Theme is well done. I know myself that it is not easy to do a site like this with Wordpress.

I would like to see more people collecting knowledge - hey even I started to collect my own stuff.

And it is totally fine to place ads if it helps him to cover costs.
 

wlanboy

Content Contributer
Updated the whole layout to start a more simplistic approach with normalize.css and font-awesome.css. 
 
Top