How to Solve AWS WAF Captcha using python
In this blog post, you will learn how you can easily bypass AWS WAF captchas puzzle and automate the solving of captchas in your workflow.
Before we start solving AWS WAF Captcha, there are some requirements that i will use in this process.
- Python
- CapSolver API Key
Verifying The Captcha protected Website URL
First of all find the link which have AWS WAF captcha protection which you want to bypass. To ensure the validity of a that website URL, you can follow these steps:
Note :- For this tutorial we will use a demo link.
➜ Make an HTTP request to that website URL and verify that it returns a 405 or 202 status code.
import requests
WEBSITE_URL = "https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest"
#checking for capthca protected webite reponse code.
response = requests.get(WEBSITE_URL)
print(response.status_code) #output = 405
➜ Verify the HTML content response received from request should contain elements such as iv, key and context ( for response code 405 )
import requests
WEBSITE_URL = "https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest"
#checking for capthca protected webite html content.
response = requests.get(WEBSITE_URL)
print(response.text)
Note:If the response code of the link is 202, we only require Aws Challange Js. key, iv, and context can be Ignored.
Bypassing The Captcha protected Website URL.
so the way this work is website actual content is protected via a captcha page and when a real human visit the website, he will be welcomed with that captcha page. After successfully verifying the captcha, website generate a unique aws-waf-token cookie and then request the same link with that cookie.
By using the unique AWS WAF token cookie, the server can identify the visitor as a verified user who has successfully passed the CAPTCHA challenge. So, the server provides access to a different page or content with same link than what is displayed to users who haven’t completed the verification process.
Same link after verifying the captcha redirect to actual content page because now it have aws-waf-token cookie attached with request.
So, basically we need that cookie token in order to directly get that content page without manually verifying the captcha. For that we will use capsolver services. They solve this captchas using their machine learning tools and give back aws-waf-token cookie which we can use for getting primary page without manually solving captchas.
We will use Python to bypass captcha using Capsolver AntiAwsWafTask. Here’s the step-by-step process we will follow:
❯ If response code is 404 .We will scrape Iv, Keys, Context if thr response code is 404. If thr response code is 202 we will scrape challange javascript url.
❯ We will send the website URL to the Capsolver API along with the required parameters ( key, iv,context etc) . As a response, we will receive a unique Task ID assigned to our captcha-solving task.
❯ The captcha-solving task typically takes between 5 to 30 seconds to complete on average. Once the task is completed, we will verify its success by providing the Task ID to the Capsolver API.
❯ If the task is successfully completed, we will receive an “aws-waf-token” cookie as the result. This cookie serves as a verification token that allows us to access the actual content of the website that was previously protected by the captcha.
Here’s the code snippet doing same steps as above ( we are using 404 demo link) :-
import re
import time
import requests
# Create a session to reuse the same connection for multiple requests
client = requests.Session()
# CAPSOLVER API key and endpoint
CAPSOLVER_API_KEY = "YOUR-API-KEY"
CAPSOLVER_API_ENDPOINT = "https://api.capsolver.com/createTask"
# The URL of the website protected by AWS WAF. (404 demo link)
WEBSITE_URL = "https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest"
# Extract the data (key, iv, context) from the script content using regex
script_content = client.get(WEBSITE_URL).text
key_match = re.search(r'"key":"([^"]+)"', script_content)
iv_match = re.search(r'"iv":"([^"]+)"', script_content)
context_match = re.search(r'"context":"([^"]+)"', script_content)
jschallange_match = re.search(r'<script.*?src="(.*?)".*?></script>', script_content)
if key_match and iv_match and context_match:
key = key_match.group(1)
iv = iv_match.group(1)
context = context_match.group(1)
jschallange = jschallange_match.group(1)
else:
print("Key, IV, or Context not found in the script.")
# Prepare data to send in the POST request to create a CAPTCHA-solving task
data = {
"clientKey": CAPSOLVER_API_KEY,
"task": {
"type": "AntiAwsWafTaskProxyLess",
"websiteURL": WEBSITE_URL,
"awsKey": key,
"awsIv": iv,
"awsContext": context,
"awsChallengeJS": jschallange
}
}
# Send a POST request to the CAPSOLVER API to create a task and obtain the task ID
task_id_response = client.post(CAPSOLVER_API_ENDPOINT, json=data)
task_id = task_id_response.json()['taskId']
# Wait for 10 seconds to give the CAPSOLVER service time to process the task.
time.sleep(10)
# Send a POST request to get the result of the CAPTCHA-solving task using the task ID
cookie_response = client.post("https://api.capsolver.com/getTaskResult", json={"clientKey": CAPSOLVER_API_KEY, "taskId": task_id}).json()
if cookie_response["status"] == "ready":
# Get the cookie (AWS WAF token) from the CAPSOLVER response
cookie = cookie_response["solution"]["cookie"]
# Make a GET request to the protected website, passing the obtained AWS WAF token as a cookie.
website_content = client.get(WEBSITE_URL, cookies={"aws-waf-token": cookie})
# Print the content of the protected website
print(website_content.text)
else:
print("capsolver failed to solve the captcha, please try again. ")
Note: If the response code is 202, we only need to pass challange JavaScript in “awsChallengeJS” as a parameter in the task.
For more flexibility you can add error catching, proxy etc. For further details and information, refer to their official documentation. https://docs.capsolver.com/guide/captcha/awsWaf.html
In summary, tackling AWS WAF Captchas might seem challenging, but with the assistance of capsolver.com and python, the process can become easily automatable.