Bank Information
For public information on banks we searched the different bank branches on their websites, or well-known sites and online directories. The searches we did, were restricted to Argentina, Brazil and the United States, but of course any country could be included. Some of the information was extracted a little more by hand than others, but generally we used a process called scrapping in most cases.
Buildings
For the part concerning buildings or possible sites to start constructing the tunnels, we used a number of different sites. In Argentina we used zonaprop, sumavisos and mercadolibre. The first two it turns out didn't have APIs (or at least not accessible to third party users), but MercadoLibre fortunately did have a library in python making it quite easy to make requests or find things we wanted.
In other Latin American countries, such as Brazil, we can also use MercadoLibre. Although it might not be the best possible search engine for real estate, it provided sufficient information for our purposes. Thus, for Brazil we used the same API. (we only had to change a couple of characters to get it working the same as in Argentina).
In the US it was a little more complicated. The APIs that are out there, didn't give out so much information and what they could do was quite limited. They showed estimated prices in an address database (or in another database with buildings categorized by ID (which you can get to through other means) or simple mortgage prices. Only Zillow gave us any ¨useful¨ data that would be useful to automate our search.
Geopositioning (GPS)
The answer for this was pretty easy. Google maps provides us with a function in it's API whereby by simply giving an address we can obtain the geographic coordinates of the location. This is similar to what is regularly done using the navigator but more automated.
Applications
Google Maps
Like we said above, the only thing we needed from Google maps was to be able to calculate the geographic coordinates using a physical address as a starting point. The code for this was pretty straight forward and only a couple of lines were needed.
import json
from urllib2 import urlopen
from urllib import urlencode
GEOAPI_AUTH = "YOUR_API_KEY_AUTH"
GEOAPI_URL = "https://maps.googleapis.com/maps/api/geocode/json?"
def getCoordinates(addr):
params = urlencode({'sensor' : 'false', 'address': addr, 'key' : GEOAPI_AUTH})
data = json.loads(urlopen(GEOAPI_URL+params).read())
if data['results']:
return (data['results'][0]['geometry']['location']['lat'],
data['results'][0]['geometry']['location']['lng'])
return None
With this information, we then needed to be able to calculate distances. Once we have the coordinates of a target and a possible candidate, we can evaluate it by calculating its distance.
import math
GEO_ERR = -1
def getDistance(coordA, coordB):
if coordA and coordB:
R = 6371
dLat = math.radians((coordB[0] - coordA[0]))
dLon = math.radians((coordB[1] - coordA[1]))
lat1 = math.radians(coordA[0])
lat2 = math.radians(coordB[0])
a = math.sin(dLat/2) * math.sin(dLat/2) + \
math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1) * math.cos(lat2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
d = R * c
return d * 1000
return GEO_ERR
Remember: We used -1 so that we know when the calculation for the distance isn't working (whether it be because of the API or some other problem with the coordinates.
getDistance([-34.6044185,-58.3815473],[-34.602953,-58.381634]) = 163.1492529420307 mts
With these results and using the distances that the different tunnels had as a base, we can estimate that for our purposes we are interested in up to 300 meters (anymore based on prior successful tunnels is probably too much (think about all the dirt you need to get rid of)!
Zillow API.
The problem with the majority of the APIs for real estate that we find is that the majority of the data is private and the stuff that is available publicly isn't particularly helpful or rich in detail.
Adding to that, the way one can see the information manually is pretty straight forward but if you want to automate it using an API, we aren't able to get enough information.
Putting it simply, the API only lets you access the information of the property, only if you have the ID itself, but without the proper ID you can't search using other filters or criteria. However, the page lets you search using the zip or postal code and using this we get an ¨interesting¨ quantity of real estate in the area (that is enough for our PoC).
However, with a mix of scrapping and using the API we were able to get what we wanted.
Thus, we can find properties by putting in the zipcode, the city and the name of the state. Lets try it using San Francisco, California, 94121.
The search engine automatically formats (if it's right), the url to the following:
As we can see in the web page, on the right you have some recommended properties with their Zillow Property ID. This we are going to use a little later with theAPI to get additional information, that right now the web page doesn´t let us access.
We define a regexp to match the Ids "\/[0-9]*_zpid\/" and after we implement a request to access directly, a city inside a state, changing the location depending on its zipcode.
import re
from urllib2 import urlopen as uopen
ZILLOW_PATTERN = re.compile("\/[0-9]*_zpid\/")
def getCandidates(city, state, zcode):
domain = "http://www.zillow.com/"
url = "%s%s%s%s" % (domain, city, state, zcode)
raw_results = uopen(url).read()
return set(ZILLOW_PATTERN.findall(raw_results))
Why did we decide to do it like this? Because at first there weren't a lot of options and second because the majority of banks' directories in the US have the zipcode next to the physical address of the bank, which is really important for us to be able to improve the proximity when we do our searches.
Once we have all the IDS we have to put them into the API, we are then able to get an address for any property, which for our purposes is of utmost importance.
from urllib2 import urlopen as uopen
import math
import xml.etree.ElementTree as ET
import re
ZILLOW_AUTH = "YOUR-API-KEY-AUTH"
ZILLOW_ESTIMATE = "http://www.zillow.com/webservice/GetZestimate.htm?zws-id="
ZILLOW_PATTERN = re.compile("\/[0-9]*_zpid\/")
def getCandidates(bank_coord, city, state, zcode):
domain = "http://www.zillow.com/"
url = "%s%s%s%s" % (domain, city, state, zcode)
raw_results = uopen(url).read()
candidates = []
for x in set(ZILLOW_PATTERN.findall(raw_results)):
xml_response = uopen(ZILLOW_ESTIMATE+ZILLOW_AUTH+"&zpid="+x[1:-6]).read()
root = ET.fromstring(xml_response)
Until here, we've gotten all our data about the properties using the API in xml format. Now, were going to parse through the most important parts and obtain the distance. For each candidate we're going to save them in a list, ordering them by proximity to the bank.
lat = getVal(root.find('response/address/latitude'))
lon = getVal(root.find('response/address/longitude'))
if lat != None and lon != None:
candidates.append([getDistance([float(lat), float(lon)], bank_coord), homedetail])
candidates = sorted(candidates, key=lambda x: x[0])
After we're only going to show the candidates that don't go over the distance limit.
for distance, details in candidates:
if distance > LIMIT:
print "[-] Skipping next candidates, over %s meters" % LIMIT
break
print "[!] Found candidate:"
print "[+] Distance:", distance
print "[+] Details:", details
print "-------------"
This will wrap-up the part about the United States. If we were to ¨standardize" the input, (which isn't too tricky), we can repeat the process for any bank.
MercadoLibre API
A little different from how we were doing things with Zillow, the only option we have with MercadoLibre is to look for properties with three filters (State, City, Neighborhood) and (unfortuanetely) there's no way to order the results in a useful fashion. Clearly, at least for someone that is looking for prices about a specific area, probably Mercadolibre isn't going to be that useful and you are going to have better results using a ¨real estate" API (because we can search for proximity). Due to this, our results are going to depend a bit on luck and how much we can abuse the three filters.
For Argentina, we're going to use a couple of city in the province of Buenos Aires and some parts in the Capital Federal district. For Brazil we'll use São Paulo state. The search process for both countries is pretty similar.
In particular, MercadoLibre's API was without quesiton one of the easiest to work with in general. It was simple to use and the development part to do testing (with documentation included that they provide) is sufficient to be able to understand everything.
Below, we find a simple (but complete) example of how to obtain the full list of buildings using a python library which can be downloaded here for free.
from meli import Meli
MELI_CID = "1337"
MELI_AUTH = "YOUR-API-KEY-AUTH"
meli = Meli(client_id=MELI_CID, client_secret=MELI_AUTH)
url = "/sites/MLA/search?category=MLA1459"
r = meli.get(url)
if r.status_code == 200:
print r.content
The response you save in JSON format for the results of the question. In this case the URL represents the parameters that the API receives.
We're not going to write too much explaining how the whole API works because it’s not our goal and because there is already a lot of literature on this topic. We are going to use MLX as our notation for each country (the X represents the country). MLA for Argentina and MLB for Brasil for example. The category filter is set to buildings (1459) and that’s what we're going to look at. .
After, for filtering all the search results, with all the parameters that would be useful for us, we can see below.
/sites/MLA/search?category=MLA1459&state=Buenos Aires&city=Capital Federal&neighborhood=Microcentro"
Really, for each value for the filters there is a unique ID in the database and this isn't always necessary when we want to apply all the filters at once.
The filters have IDS looking like this:
"capital federal": "TUxBUENBUGw3M2E1"
"belgrano": "TUxBQkJFTDcyNTJa"
All the IDs (generally) are associated with their physical location. For example if there are two cities in different provinces with the same name, when we use the ID, automatically it’s going to show which province the place belongs to. It's not too difficult at all to get everything and put together a good database, so luckily solving our conundrum isn't proving too complicated, now that we have public data and free access.
This works to our advantage because the searches we can do by filtering by the smallest denomination, which in our case would be by neighbourhood. If we are trying to analyze a bank in Buenos Aires proper and the address of the bank is in Belgrano (a neigborhood in Buenos Aires), we only need to use the neighbourhood filter.
/sites/MLA/search?category=MLA1459&neighborhood=TUxBQkJFTDcyNTJail.
Also, by limiting and offsetting we can cut down our results to get the prices.
The rest is interpreting the results, getting the address of each property and calculating the distance between the bank, for which we are applying the filters and see if the address is useful.
Our results
It's important to remember that the views we are going to provide are through Gooble Maps and they're really only for us to better orientate ourselves a bit (the distances to do a tunnel don't need to take into account which way the streets are going and these kinds of things ;)
Some close ones we got for Brasil, Sao Paulo
In the case of Banco Safra, the coordinates from the news send us 6 meters from the bank, but in the article the address is much farther away. This would be a false positive but not because of our search mechanisim, but because we are depending on if the users fill out correctly the search fields.
And sometimes we find one building works for different banks.
From closest to farthest away we have some of the results we got for Argentina, Buenos Aires
Some examples we got for USA, San Francisco, taking advantage of the request limit for the API
Contingency Plan:
We believe that the organizations, law makers and financial institutions should make new requirements and precautions in the buying and selling of real estate for those located near centrally strategic points. For us, banks should try and be proactive about this, trying to figure out if they are really at risk for this kind of robbery and what precausions they should take.
Future work
Mercadolibre.
Additionally, we can do a lot of improvements to be able to accelerate the process automatically and with better performance.
In the case of MercadoLibre, as the filters are what gave us the most trouble, we could organize the information of the banks by filters (that they're going to apply), and for each result we compare with all the banks in the group, saving us a bit of unnecessary searches.
If we have the following banks:
Banco A, Buenos Aires, Capital Federal, Palermo, Mario Bravo 1000
Banco B, Buenos Aires, Capital Federal, Belgrano, Cabildo 100
Banco C, Buenos Aires, Capital Federal, Palermo, Honduras 1000
Banco D, Buenos Aires, Capital Federal, Belgrano, Virrey del Pino 100
Banco E, Buenos Aires, Capital Federal, Palermo, Córdoba 3000
The banks A, C and E the same filters apply:
State : Buenos Aires
City: Capital Federal
Neighborhood: Palermo
and for banks B and D the following:
State : Buenos Aires
City: Capital Federal
Neighborhood: Belgrano
With all the final address sorted out, we can group them by filter. Each filter is a type of unique key that the group banks would share. If we're able to do this, instead of an attempt for each bank, we only need to try with the filters and we should be able to compare the distance for each one of the members.
Zillow
In Zillow's case, there was a way to control the limits of the API. The quantity of the petitions is by AUTH KEY, but by putting any real email address we can bypass the limits. Afterwards, its a matter of checking when we have reached the limit (put a counter if you want) and rotate different KEYs.
Google maps
The Google Maps APU also has a limit, but APIs such as MercadoLibre don't and this gives us the opportunity to get more coordinates of different buildings (although they're not always there). In the case of googlepi, we got to the limit of the number of the times we could use the coordinates.
Services
Another thing we could do is pay for all the services that we used, if we wanted to reduce the limits or get rid of them all together. The paid version of the API for Google Maps l lets you do searches around a specific area and filter by category. One of the categories you can find is called ¨banks", need we say more?.
Spoiler: If you only try with the avialibe properties, having the nearby banks (uisng the bank filter) we ŕe able to get the most complete results of our searches.
Conclusion
As we've been able to show, the first step for finding a good place for our tunnel is really easy.
We believe that this type of research helps us to be aware that the information one can find is abundant and people can use this information in productive or counterproductive ways (I suppose it also largely depends on what one's definition of productive is, for some robbing a bank would fall in the first category).
Do you know who to trust with your information?
Funny facts
#1.- While robbing Banco Rio de Acasuso in 2006, the tunnelers all the time stayed in one shop, they ordered pizza and soda and sang happy birthday to one of the hostages.
When the police decided to go into the bank, they only found the hostages, the tunnel where the thieves had fled, toy weapons and a weapon saying ¨In ricachones neighborhood, no weapons or grudges. It's only money, not love¨.
#2.- While robbing the Banco de la República de Pasto in 1977, the criminals had time to tell some jokes. On the vault they wrote in big letters ¨Chanfle, he wasn't expecting my cunning¨.
Useful Links.
Branches for banks and safe boxes, official web pages.
This will be the second of a series of articles highlighting different ways to abuse public accessible information. Also, a big thanks to Matias A. Ré Medina and Francisco Amato for their huge contributions to the article.