How to scrape local business leads in Python without an API
Apollo and ZoomInfo are B2B-only and expensive. For local services you scrape Bing and Yelp. The polite-crawler pattern, dedupe rules, and legal guardrails.
You need a list of every plumber in three Maine cities. Or every CrossFit gym in Massachusetts. Or every electrician within thirty miles of your office.
The big-name lead tools cannot help you. Apollo, ZoomInfo, Clay, Lusha. They are B2B-focused and expensive ($49 to $149+ per user per month) and their data is mostly companies with LinkedIn pages, not solo trade-business operators with a website and a Yelp listing.
For local services, you scrape. Here is the actual pattern.
Where the data actually is
For local-services prospecting, three sources are worth scraping:
- Bing search results. Bing exposes business listings in the search results, and unlike Google has been relatively friendly to crawling for years. Google deprecated their public Search API and made the SERP hostile to scraping. Bing’s SERP is straightforward.
- Yelp listings. The Yelp Fusion API exists but rate-limits hard on the free tier and requires a developer account. The web listings are public and contain the same data plus reviews.
- Google Business Profile listings. Available, but Google blocks scraping aggressively. Skip unless you really need the Maps coordinates and are ready to invest in proxy infrastructure.
Two sources (Bing, Yelp) get you most of the way. Adding Google adds maybe 15% more leads at 10x the engineering cost. Not worth it for most projects.
The polite-crawler pattern
The line between “scraping” and “getting blocked instantly” is whether you act like a respectful crawler or a denial-of-service tool.
Five rules:
- Identify yourself. Set a User-Agent that says who you are.
Mozilla/5.0 (compatible; YourCompanyBot/1.0; +https://yoursite.com/about)is fine. Do not pretend to be Chrome. - Respect robots.txt. Read it before you crawl. If a path is disallowed, don’t.
- Rate limit. One request every two to five seconds is the floor for anonymous crawling. Faster than that and you’re going to get banned.
- Backoff on errors. If you get a 429 or a 503, wait a full minute before retrying. If you get a third in a row, stop the run and email yourself.
- Cache aggressively. Save every response to disk. If you re-run the script tomorrow, don’t re-fetch URLs you already have. This saves your IP and saves you time.
If you follow those five, you can crawl Bing and Yelp at one-request-per-three-seconds for hours without getting blocked. Faster than that or sloppier than that and you will.
What the output looks like
For local services, the row you want is:
| Field | Source |
|---|---|
| Business name | Bing or Yelp listing title |
| Phone | Listing detail page |
| Address (street, city, state, zip) | Listing detail page |
| Website URL | Listing detail (where present) |
| Notes | Number of reviews, star rating, year established (where present) |
| Source URL | The Bing or Yelp page that produced the row |
Source URL is non-negotiable. When you are cold-emailing or cold-calling, you need to verify “is this a real business” before you reach out. Linking back to the source listing makes that fast.
Dedupe by phone number, not by name
Business names are inconsistent. “Bob’s Plumbing” appears as “Bob’s Plumbing LLC”, “Bob Plumbing”, and “Plumbing by Bob” across three sources. Phone numbers don’t have that problem.
Normalize phone numbers to E.164 (+12075551234) and dedupe on that. You will catch about 80% of duplicates that name-matching misses. Add a fuzzy name match on top for the remaining 20% (Levenshtein distance under 3, business names lowercased and stripped of “LLC”, “Inc”, ”&”).
What to skip
Email scraping from websites. Most local trade businesses don’t list emails. The ones that do put them in image format or contact forms specifically to defeat scrapers. Stick with phone for cold outreach to local services.
Google Maps API. It exists, has a generous free tier, and is the obvious choice. Skip it because the Terms of Service prohibit displaying Maps data outside Google Maps surfaces. If you are building a list and reselling it or using it in your own UI, you are violating the TOS even on the free tier. Some teams ignore this and operate fine for years. Some teams get cease-and-desist letters. Pick your risk tolerance.
LinkedIn. They block scraping aggressively, sue scrapers, and the data is mostly white-collar B2B. Wrong tool for local services anyway.
Anything that requires logging in. Once you are logged in, the site’s TOS apply to you specifically, and the legal exposure goes from “gray area” to “clearly violating terms.” Stick with public, unauthenticated pages.
The legal short version
In the US, scraping public, unauthenticated web pages is generally legal under the hiQ v. LinkedIn precedent. Selling that data is also generally legal. What gets you sued is bypassing technical access controls (login walls, CAPTCHAs, IP blocks), violating a site’s TOS while logged in, or scraping at a rate that effectively denies service.
You are responsible for your own use. Local-services prospecting from public Bing and Yelp listings is well inside the safe zone. If your use case sounds different from that, get an actual lawyer to look at it.
What you do with the CSV
Once you have a clean leads.csv:
- Filter to your service area (sometimes the listings include businesses 50 miles away that match the search term)
- Filter out repeat sources (your existing customers, agencies you compete with)
- Hand off to a cold-email tool (Apollo, Smartlead, Instantly) or a phone-call workflow
- Keep the CSV alongside whatever outreach you do, with a “first contacted on” column. Re-running the scrape next month and removing already-contacted leads is the difference between reaching every business once and burning your domain.
What this looks like, built
Lead Spider is the Python implementation we use for our own prospecting. Reads region and category, crawls Bing and Yelp at the polite-crawler rate, dedupes by normalized phone, writes a clean leads.csv with name, phone, address, website, and notes. $49, single-buyer commercial license, source you run on your own machine.
If you have a list of cities and a list of categories and you have been thinking “I should just type these into Yelp and copy them down by hand,” the script that does it for you is closer than you think.