Thursday, February 2, 2012

How I dumped profile pics of first 10000 Facebook users within few hrs.


This Post is Originally posted by Debasish Mandal on his website www.debasish.in
A clip form the movie "The Social Network"


Hi all,In this article I am going to tell you guys how I have downloaded profile / Cover picture of first ten thousand Facebook user within few hours using a python script of near about 100 lines. Here I have used Facebook graph api and An html comment present in profile page of Facebook(You will get to know more about this later on).

So what is Facebook graph api?

Using Facebook graph api you can retrieve  few profile information of a Facebook user, like profile id ,First Name Last Name,Facebook username ,user’s gender and locale.
To get this information only thing you have to do is access following url.

http://graph.facebook.com/?id=<target profile id>

Just replace the id parameter with your own. One important thing is, the api returns false if the id is not valid. For example if you try to access id=1 the api will return false because that is not a valid facebook id. But if you change the parameter to 4 you can see the api will return above mentioned information of Mark Zuckerberg. Using this graph api I am going to check if the target profile ID is valid or not. You might think why I have used this api. This is true that same thing can be done by accessing http://www.facebook.com/1,2,3 bluh bluh … like this. My answer is ..Light weight of this api. So you don’t have to craft each and every http headers to check for valid profile id.

Another feature of graph is getting like and share counts of any link .Graph api returns the count of how many times a link is shared or liked on Facebook through JSON. You can do it in this way


http://graph.facebook.com/?id=http://www.google.com/

Another thing you can do with Facebook graph api is Block Detection. If a user tries to access any invalid profile (For example http://www.facebook.com/profile.php?id=random_number)the application takes the user to a page like “The page you requested was not found.”If any user is blocked by someone then also the application does the same. Using graph api one user can easily understand if you are blocked by someone or not.

An interesting html comment:

If you look at the source of any profile page of a fb user when you are logged in, then you can find that Facebook returns the actual image location of profile/cover pictures through an html comment.

For example Mark Zukerberg’s Facebook Profile is http://www.facebook.com/zuck. We can find the image location of his current cover picture by inspecting it which is

http://a1.sphotos.ak.fbcdn.net/hphotos-ak-ash4/311205_989690200741_4_42618747_1231438675_n.jpg

Looking at the source code of his profile page I have found that the application is disclosing this image path(http://a1.sphotos.ak.fbcdn.net/hphotos-ak-ash4/311205_989690200741_4_42618747_1231438675_n.jpg)through an html comment like :

<!-- <div class="fbTimelineTopSectionBase"><div id="pagelet_above_header_timeline" data-referrer="pagelet_above_header_timeline"></div><div id="above_header_timeline_placeholder"></div><div class="fbTimelineSection mtm fbTimelineTopSection"><div id="fbProfileCover"><div class="cover" style="margin-top: -115px;" data-collapse="115"><a class="coverWrap coverImage" href="http://www.facebook.com/photo.php?fbid=989690200741&amp;set=a.941146602501.2418915.4&amp;type=1" rel="theater" id="fbCoverImageContainer"><img class="photo img" src="http://a1.sphotos.ak.fbcdn.net/hphotos-ak-ash4/311205_989690200741_4_42618747_1231438675_n.jpg" style="top:0px;width:100%;" data-fbid="989690200741" alt="Cover Photo" /> -referrer="pagelet_timeline_nav"></div><div id="pagelet_above_header_not_timeline" data-referrer="pagelet_above_header_not_timeline"></div></div></div><div id="timeline_tab_content"><div id="pagelet_escape_hatch" data-referrer="pagelet_escape_hatch"></div><div id="pagelet_timeline_recent" data-referrer="pagelet_timeline_recent"></div></div> -->


One important thing about this is, The application does not return the html comment line if you are not logged in Facebook.

So using this html comment it’s become much easier to mass download Facebook users profile/cover picture.


My strategy to achieve the target was

1)    Choose any random profile id.
2)    Using graph API verify if the id is valid or not. If the id not valid, server will return “false”. If the ID is valid the server will return some information like name profile id, gender location etc.

3)    If the id valid I will send an http request to facebook.com with all necessary http headers. 
For example : http://www.facebook.com/4
Now Then the server will redirect us to the actual profile location. Now From location http header in server response I will get the actual profile location.

4)    Now I will craft another http request with a valid session cookie and other mandatory http headers and request the profile page of the target user. Then the server will return client side codes of that user's profile page.

5)    After grabbing the client side code, As the Facebook application returns the actual image location of profile picture or cover picture through an html comment the image url can be easily extracted from the page using simple regular expression.
 

6) After getting the Image url its very easy to download the picture.

     I have written this python script to automate the above mention process
import httplib
import urllib
from urllib import urlretrieve
import gzip
import StringIO
import re
import time
#This is a vaid session cookie
cookie = 'valid session cookie'

def ptime():
    tm = time.localtime()
    return str(tm[3])+':'+str(tm[4])+':'+str(tm[5])
loc_r = 'C:\\pics\\'
ext = '.jpg'
pname = 'Profile_Picture_of_'
def name(id):
    conn = httplib.HTTPConnection("graph.facebook.com")
    param = "/?id="+str(id)
    conn.request("GET", param)
    r1 = conn.getresponse()
    data = r1.read()
    match = re.search(r'\w+\s\w+',data)
    if match:
        return match.group()
    else :
        return 'not found'
    conn.close()
#Function to chech if the id is valid or not
def graph_q(id):
    conn = httplib.HTTPConnection("graph.facebook.com")
    param = "/?id="+str(id)
    conn.request("GET", param)
    r1 = conn.getresponse()
    data = r1.read()
    return data
    conn.close()
#This Function will grab the response http header and extract the "username" for example "zuck" for mark zukerbarg
def redirect(id):
    headers = {"Host": "www.facebook.com",
           "User-Agent": "Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20100101 Firefox/8.0",
           "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
           "Accept-Language": "en-us,en;q=0.5",
           "Accept-Encoding": "gzip, deflate",
           "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7",
           "Proxy-Connection": "keep-alive",
           "Cookie": cookie
           }
    conn = httplib.HTTPConnection("www.facebook.com")
    conn.request("GET", id, "", headers)
    response = conn.getresponse()
    #print response.status, response.reason
    head = response.getheaders()
    h = head[6]
    return h[1][24:]
    conn.close()
#print redirect(6)
#This function will download the Image
def download(link,id):
    picturename = name(id)
    urlretrieve(link, loc_r+str(id)+'_'+pname+str(picturename)+ext)
#This function will send the original request and from response extract the "image url" using regular expression
def getimageurl(user):
    slash = '/'
    user = slash+user
    headers = {
    "Host": "www.facebook.com",
    "Proxy-Connection": "keep-alive",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.75 Safari/535.7",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Encoding": "gzip, deflate",
    "Accept-Language": "en-us,en;q=0.5",
    "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7",
    "Cookie": cookie
    }
    conn = httplib.HTTPConnection("www.facebook.com")
    conn.request("GET", user, "", headers)
    response = conn.getresponse()
    compresseddata = response.read()
    compressedstream = StringIO.StringIO(compresseddata)
    gzipper = gzip.GzipFile(fileobj=compressedstream)   
    data = gzipper.read()
    match = re.search(r'<img class="photo img" src="\S+',data)
    if match:
        return match.group()[28:-1]
    else:
        return "http://www.yooter.com/images/pagenotfound.jpg"
    conn.close()

print "[*] Welcome to facebook mass profile picture downloader"
raw_input('[*] Press Enter to start downloading')
start = raw_input ('[*] Enter start no :')
end =  raw_input ('[*] Enter end no :')
print "[*]",ptime(),"Exploit Started"
for i in range(int(start),int(end)):
    print "[*]",ptime(),"Target profile id",i
    print "[*]",ptime(),"Checking if id",i,"belongs to any valid user account or not"
    check = graph_q(i)
    if check != "false":
        print "[*]",ptime(),"The id ",i,"is valid"
        nm = name(i)
        print "[*]",ptime(),"Profile Name of id",i,"is ",nm
        print "[*]",ptime(),"Trying to retrive the username of",nm
        username = redirect(i)
        print "[*]",ptime(),"User name of",nm,"is ",username
        print "[*]",ptime(),"Retriving url of profile picture"
        imgurl = str(getimageurl(username))
        i = int(i)
        #print imgurl
        print "[*]",ptime(),"Downloading profile picture of",nm
        download(imgurl,i)
        print "[*]",ptime(),nm,"'s Profile picture Download Successful :)"
    else:
        print "[*]",ptime(),"ID",i,"is not a valid facebook account :("
print "[*]",ptime(),"Job Done"
Abusing graph api may not be a very big deal but I have informed Facebook about this html comment present in profile page and shared this exploit code with them. According to them cover / profile pictures must be public and so that html comment line does not have any direct impact on Facebook application. But interestingly after getting reply mail form them, the above mentioned script stopped working like before. Most probably they have implemented any anti automation techniques or something smiler to that to prevent this. 
Video

2 comments:

  1. Downloading the photo's is the easy part, the hard part is doing it thousands of times per minute without getting blocked from the API temporarily.

    ReplyDelete