Question:
question about an sql query based on zip codes?
2008-01-22 17:19:09 UTC
hello all. i have a question regarding an efficient way to create a query that looks for users based on zip codes. heres a rundown


my site will have many users with many searchable fields. a hypothetical example follows:

name : foo
username: foo
car_type ford
beer_type budweiser
ZIP_CODE 90210
etc..

ok now i want to be able to search for users within X miles of a zip code. for example, if i wanted to search within 100 miles of the zip code 90210, there would be about 75 zip codes to search through. and further, i want to know who drives a ford and drinks budweiser within 100 miles of 90210. i was able to put together a working query on another website, but i am no sql expert so i have no idea where to start in terms of optimizing it.


NOW FOR THE QUESTION would you approach it by creating 75 different queries, one for each zip code in which you search that zip code for car_type = 'ford' AND beer_type = 'budweiser' ? there has to be a more efficient way. or was i already right?


Thank you very much
Three answers:
Mohamed Mansour
2008-01-22 18:22:26 UTC
Hello,



This will be a long post, cause your question is very important question regarding people who are new to SQL and would like to produce quality, efficient and optimized database designs. I have splitted this into two parts, if you wish to learn how to optimze your database further, read the end. I will first answer your questions then help you optimize.



Regarding your overall questions. It seems you have an application to do in mind that requires zipcodes. The efficient way to do this, is to have tables that deals with zipcodes. Then you have another table that deals with the relationship between users and zipcodes. Please take a small moment to read this short article. http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html It will help you understand what I am talking about. And it is the most important thing to learn when doing SQL Database Design. Database normalizations will produce more optimzed queries and higher efficiency. It will help reduce database size as well as make queries run much faster.



Concerning zipcodes, the way the industry does it, they have zipcodes related to latitudes and longitudes. You will need to buy a database from an outside source, that has uptodate information regarding zipcodes. You could download a database from the net (simple search) but they are not accurate. From those latitudes and longitudes derived from each zipcode, you will be able to find the distance from 10miles or any other miles you need. You need to learn some math and geology to convert latitude and longitude to miles (easy since it is USA) http://academic.brooklyn.cuny.edu/geology/leveson/core/linksa/lat_long.html It is basically looking at the modern map and see what each degree in the latitude and longitude represents as miles. If you notice the following links:

- http://www.allplaces.us/dbz.htm

- http://www.melissadata.com/lookups/zipdistance.asp



They demonstrate the distance between zipcodes, they use the latitude/longitude technique (the difference between longitude and latitude then convert to miles)



Once you find the distance between the both latitudes and both longitudes, you use a simple mathematics formula to calculate the distance... Explanation how to do this would be found here:

http://www.pages.drexel.edu/~brooksdr/DRB_web_page/howto/distance.htm



You will need to select all zipcodes and calculate its distance from the current zipcode using hte function above, then check which ones are within the 100 range.

=========================



Extra stuff to help you optimize your db further



=========================

So let me start out and assist you with your current design, and help improve it one step further in order to make it optimized, fast, and flexible. Most importantly, easy to maintain!



According to your current example, you stated that you have a single table that contains all those data. That is not a normalized database. You could easily expand that design into at least 4 main tables. Each table is independent from each other. So if a person has a Ferrarri, you add that to the database. You could even normalize the Car table further if you really need to describe the Car structure more. That would be a deeper normal form. Hopefully after you read what I am explaining and after reading the article I posted above, then you will understand how to do that. For simplicity purposes, we just need the Car Name, model, and year for this application. Same goes for Beer and Zipcodes.



Table1 : User { userID | username | full name } P_KEY = userID

Table2 : Car { carID | carType | carModel | year} P_KEY = carID

Table3 : Beer { beerID | beerType } P_KEY = beerID

Table4 : Zipcodes {zipID | zipCode | latitude | longitude } P_KEY = zipID

Where P_KEY = primary key



This way, you will normalize your table to create a more efficient database. To communicate between those tables, you will introduce some sort of relational mappings. It depends how much mapping you need. Such as 1-to-1, many-to-1, many-to-man. For example:



A User may have at most one car.

A User may have one or more car.

A User may have one beer type.

A User may have more than one beer type.

A User may have more than one zipcodes.

A User may have at most one zipcode.



If all the relations are one to one, where a user must have at most one Car, at most one Beer type, and at most one zipcode then your relational database could be just one table. Then it is a 1-to-1 mapping. You change mappings within the relational Table in this case our relational table would be ConsistsOf (I am bad with naming, pick something good:> ) We will use this approach since no extra relations are needed.



Table5 : ConsistsOf { userID | carID | beerID | zipID }



Now for a 1-to-1 mapping you assign the P_KEY as (userID). Which means each user will have at most one relation.



Say we would want the user to have multiple cars? In a 1 USER may have more than 1 CAR then your P_KEY would be (userID,carID) in a collection. That will ensure no user can have the same carID. So you can use the same ConsistsOf table.



Now lets say we want the user to have one-to-many cars, and a user has one to many beers, then you need to change your design for the relations, and create more relational tables.



Table5: UsersCar {userID | carID }

Table6: UsersBeer {userID | beerID }

Table7: UsersZipcode {userID | zipID }



That will allow you to create that relation. So every time a user adds info, you will need to insert them into appropriate tables. Again, that link I mentioned has a small example that you could use. For the above tables 5-7, you can use P_KEY ( userID, carID) as collection if you want at most 1 user has at most 1 car. I personally will use that approach than the previous table 5. It is more normalized and flexible to maintain in the future.



That is how I will design my database (first draft) a normalized database. Easy to maintain and efficient and fast. Now te question is how would I query my info? How can I select all my info? The answer is relatively simple, you use SQL Joins. You can join any table with any id, for example the following join will join the car table together withe the user table in one big table (similar to what you have originally stated)



SELECT u.userID, c.carModel

FROM User u LEFT JOIN UsersCar uc ON u.userID = uc.userID

LEFT JOIN Car c ON c.carID = uc.carID;



That will produce a table like the following:



| userID | carModel|

--------------------



Let the DBMS (Database Management System) deal with the efficiency of using indexes, since indexes are the fastest way to search in a database.



This might seem a big complex, but if you read the previous link I mentioned to you, you will understand better. You will learn SQL design very good here.



I hope I helped a little. If you implement or try to implement the above, it will optimize your tables greatly.



Good Luck
rod
2008-01-22 18:27:56 UTC
Are you only going to be dealing with those 75 zip codes for your site or was that just an example? Even at that it's a daunting task. From 90210 you'd have to find the mileage to each of the other 74 zip codes and store it. Then you could query the stored column for distance less than your criteria. But are you ONLY searching from 90210? Or can the search start from ANY of the 75 codes? In that case you'd have to start from each of them and figure the mileage to each of the others and store it.



There are something like 76,000 active zip codes in the US.



The way this is usually handled is with a zip code database that shows a longitude and latitude for each zip code. Then when you enter your search criteria (75 miles from 90210) the formula checks the longitude/latitude for the center of 90210 and then finds all zip codes whose long/lat coordinates are within 75 miles.



Try this site for a free zip code database download and some other tips and formulas:



http://www.darkshire.org/~jhkim/public_html/programming/zipcodes/
TristanVR
2008-01-22 17:31:44 UTC
I would build one big dynamic query that included all the criteria you wanted. If that included 10 zip codes then fine.



WHERE car_type = 'ford' AND beer_type = 'budweiser'

AND zip IN (90222, 90320, 90323)


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...