Hi again, In this blog, I will discuss my apporach how I reversed and solved the geetest captcha.
Goal
Goal is to solve the geetest captcha
programmatically
and also refraining from the use of any automation tools.I first tried to understand how this captcha works and got some approaches but they involved the automation of the browser at the final stage.
- The most common approach was:
- using the OpenCV and getting the edges of the piece (that is meant to be put in the hole or space given in the image) and the hole.
- calculating the offset between the two and accordingly set your mouse cursor (in the script) and,
finally, you need to use an automation tool to do the final task.. that is
sliding the cursor
(only a type of catcha, there are other types of puzzles too..)- I am sharing the links of the approaches I found:
Since, it was told to solve it programmatically, and avoid the usage of anykind of opensource solver and automation tool. I decided to deep dive at the root level and tried figuring out the working of this captcha. [Though there was a time constraint for this, but I thought it can be a worthy try]
- Please refer to my repo for referring the below mentioned files. Repo
Part-1
- I opened the developers tool and inspected the files which were loaded during the captcha and also when the captcha was verified.
The interesting thing to look is
captcha_output
that we need to reverse engineer how is it being generated and for this we need to check files and reverse the logic behind it.As we can see there are many files related to
gcaptcha...
. I went through all the files and tried to understand them.There is a one file, which was
heavily obfuscated
, (code size is approximately 11k+ lines due to the obfuscation) and that was enough to press a panic button.I tried beautifying the .js scripts using some online tools like https://beautifier.io/
I searched some relevant keywords in the code and wanted to verify if I am working on the correct file.
I looked at the first function in
gcaptcha4.js
file. It was obfuscated throughcontrol-flattening
since we can see the dispatcher and many branches here. I tried reversing manually since it was pretty clear that it is decrypting thelong URI string
with the key(0ZZi2
, and the fucntion takes the index (after decrypting the string that is stored as array), and the array returns the result stored at that indexLook at
decode_func1.js
and the result is stored indecrypted_output_1.txt
anddecrypted_output.txt
Now moving forward, next function was also involved control flattening but i think it didn;t of much help. But the major portion of the code is obfuscated the other way.
I tried many online deobfuscators but they were not of any help. [Here i wasted my lot of time in searching for the deobfuscators]
Then, I shifted to another hercules task, that is:
writing my own deobfuscator
for this. I triedrefactoring the code
and taking the help from https://astexplorer.net/ [understanding its AST structure] …. Again here, I wasted a lot of time because I still didn’t get any good results.. Around 3 hours..Then I came to last option and that was reversing the code
manaually
. Reversing the code manually took a lot of time but it was giving some results atleast.I used the
regex.py
(provided) to replace all those array indexing parts with the bigger array, which i earlier decoded and stored. So, till now, the code is bit cleaned up and also bit understandable now.
function detectAutomation(env) {
const checks = {};
//check for PhantomJS
checks.phantom = ("_phantom" in env || "callPhantom" in env) ? true : false;
// check for WebDriver
checks.webdriver = ("webdriver" in navigator) ? true : false;
// check for Nightmare.js
checks.nightmare = ("__nightmare" in env) ? true : false;
// Check for ChromeDriver artifacts
checks.cdc = ("$cdc_asdjflasutopfhvcZLmcfl_" in env) ? true : false;
// Try object descriptor test
try {
const desc = Object.getOwnPropertyDescriptor(navigator, "webdriver");
checks.webdriverDescriptor = typeof desc === "object";
} catch (e) {
checks.webdriverDescriptor = false;
}
const susSubstrings = ["ph", "cp", "ek", "wd", "nt", "si", "sc"];
for (const key of Object.keys(env)) {
if (susSubstrings.some(sig => key.includes(sig))) {
checks["sus:" + key] = true;
}
}
return checks;
}
This script basically checks some automation libraries and frameworks. Basically, a
bot detection snippet.
Now, my focus is to find the logic for those params that we saw when captcha is verified. How the encryption is being handled, which cryptographic algorithm is it using?
Lines from
2500-2900 are handling captcha's events and handlers
. I am unable to paste the large chunk of code here.
This was something very interesting for me, a public key
it seemed. I wanted to confirm the Algo first. It was of 129
bytes.
1
2
>>>> int("10001",16)
65537 <--- this is famous number, e in `RSA algorithm`
The other params like,
n, e, d, p, q
they are of mostly like RSA params. Though we are not sure that this captcha is using this.. because i see other algos too while going through the code. BecauseRSA is a symmetric crypto algo
To decrypt something we stillneed the private key
.This code snippet is listening and monitoring our mouse moves.
var u = 0, c = function x(_ᖙᕴᕾᕶ) {
var _ᖗᖗᖘᕾ = var_a.proxy1, _ᖈᕹᕴᖃ = ["$_DCIJm"].concat(_ᖗᖗᖘᕾ), arr1 = _ᖈᕹᕴᖃ[1];
_ᖈᕹᕴᖃ.shift();
var _ᖘᕹᕴᕸ = _ᖈᕹᕴᖃ[0];
if (256 <= (u = u || 0) || _ᕸᕺᖂᖗ <= a) window.removeEventListener ? (u = 0, window.removeEventListener("mousemove", x, false)) : window.detachEvent && (u = 0, window.detachEvent("onmousemove", x)); else try {
var s = _ᖙᕴᕾᕶ.x + _ᖙᕴᕾᕶ.y;
_ᖆᕸᕶᖁ[a++] = 255 & s, u += 1;
} catch (e) {}
Looks like, we have got the AES with its
iv:0000000000000000
andCBC
mode. key may bevar str__1 = (0, _ᕹᕾᕶᖙ.guid)()
some random string.. though not sure at this point.CONSISTENT: u = "3", UNSURE: "2", INCONSISTENT: _ = "1", TESTS: a = {PHANTOM_UA: "aup", PHANTOM_PROPERTIES: "sep", PHANTOM_LANGUAGE: "egp", HEADCHR_UA: "auh", WEBDRIVER: "rew", HEADCHR_PERMISSIONS: "snh", SELENIUM_DRIVER: "res", CDC: "cdc"}});
<- Some useful data [Lines 4300 to 4500 they are monitroing the browser details and other frameworks here]Lines 3600 to 4000 around, they involve some crypto algos implememtations, like crc32, base64, shifts, padding and many others…
I was stuck here at this code snippet, it is having
pow_msg
device_id
lot_number
pow_sign
: what are these ? They must be involved in encryption part.parseLotString
getStringByIndexes
: looked more at these… Look atparselotString.js file
(provided) and alsogetStringByIndexes.js
func, their deobfuscated versions…
It needs:
pow_detail.hashfunc
pow_detail.bits
pow_detail.version
pow_detail.dateTime
lot_Number
captchID
`` These must be needed to generate the pow_msg and pow_sign
Now i searched for the keyword
pow_sign
and I found this snippet.- I have messed up with the variable naming, I am very sorry for that. but i can guess the params here.
var a, _ = bits % 4, u = parseInt(bits / 4, 10), c = (a = "0", new Array(u + 1).join(a)), h = version + "|" + bits + "|" + hashfunc + "|" + datetime + "|" + captchaID + "|" + lotNumber + "|" + emptystring + "|";
Used the order of params from the previous figure -> It’s confirmed that this function is forpow_sign
andpow_gen
.- Another important highlighted code snippet,
function () {
var x = var_a.proxy1, _ᖙᕴᕾᕶ = ["$_CBJD"].concat(x), _ᖗᖗᖘᕾ = _ᖙᕴᕾᕶ[1];
_ᖙᕴᕾᕶ.shift();
var _ᖈᕹᕴᖃ = _ᖙᕴᕾᕶ[0];
var arr1 = "undefined" != typeof self ? self : "undefined" != typeof global ? global : this;
arr1._lib = {"W4Ec": "7RXi"}, arr1.lib = arr1.lib || {}, arr1.lib._abo = {"(n[21:24])+.+(n[7:14])+.+(n[10:13]+n[19:22])":"n[26:29]"};
}()
- Looks like pattern for string slicing or indexing. I need to confirm if it is used in
parselotString()
becuase in that function too+.+
.
such slicing was involved. ` {“W4Ec”: “7RXi”}this seems to be interesting... let's dig further. We can see
window._libsince
_libis stored on the global
window` object
case var_a.proxy2()[0][10]:
(0, _ᖉᖘᖂᕾ.$_BBL)(x, {gee_guard: _ᖙᕴᕾᕶ.geeGuard}), (0, _ᖉᖘᖂᕾ.$_BBL)(x, window._lib || {});
var i = (0, _ᖉᖘᖂᕾ.getStringByIndexes)(_ᖙᕴᕾᕶ.lot, _ᖙᕴᕾᕶ.lotNumber), o = (0, _ᖉᖘᖂᕾ.getStringByIndexes)(_ᖙᕴᕾᕶ.lotRes, _ᖙᕴᕾᕶ.lotNumber), r = i.split("."), i = {};
r.reduce(function (x, _ᖙᕴᕾᕶ, _ᖗᖗᖘᕾ) {
var _ᖈᕹᕴᖃ = var_a.proxy1, arr1 = ["$_BICED"].concat(_ᖈᕹᕴᖃ), _ᕸᕺᖂᖗ = arr1[1];
arr1.shift();
var _ᖆᕸᕶᖁ = arr1[0];
return _ᖗᖗᖘᕾ === r.length - 1 ? x[_ᖙᕴᕾᕶ] = o : x[_ᖙᕴᕾᕶ] || (x[_ᖙᕴᕾᕶ] = {}), x[_ᖙᕴᕾᕶ];
}, i), (0, _ᖉᖘᖂᕾ.$_BBL)(x, i), x.em = {}, (0, _ᖂᕶᖃᖂ.default)([], x.em);
x = (0, _ᖆᖚᕾᕺ.default)(_ᕸᕿᖂᖁ.default.stringify(x), _ᖆᕸᕶᖁ), x = {callback: "", captcha_id: _ᖙᕴᕾᕶ.captchaId, challenge: _ᖙᕴᕾᕶ.challenge, client_type: _ᖙᕴᕾᕶ.clientType, lot_number: _ᖙᕴᕾᕶ.lotNumber, risk_type: _ᖙᕴᕾᕶ.riskType, payload: _ᖙᕴᕾᕶ.payload, process_token: _ᖙᕴᕾᕶ.processToken, payload_protocol: _ᖙᕴᕾᕶ.payloadProtocol, pt: _ᖙᕴᕾᕶ.pt, w: x};
(_ᖆᕸᕶᖁ.extraData && "android" === _ᖙᕴᕾᕶ.clientType || "ios" === _ᖙᕴᕾᕶ.clientType && !_ᖙᕴᕾᕶ.post) && (x.GeeToken = _ᖆᕸᕶᖁ.extraData && _ᖆᕸᕶᖁ.extraData.GeeToken ? _ᖆᕸᕶᖁ.extraData.GeeToken : null), !_ᖙᕴᕾᕶ.checkDevice && x.GeeToken && delete x.GeeToken, (0, _ᕹᕾᕶᖙ.jsonp)(_ᖙᕴᕾᕶ, "verify", x, _ᖗᕺᕴᖁ).$_JAH(function (x) {
var _ᖙᕴᕾᕶ = var_a.proxy1, _ᖈᕹᕴᖃ = ["$_BICJi"].concat(_ᖙᕴᕾᕶ), _ᕸᕺᖂᖗ = _ᖈᕹᕴᖃ[1];
_ᖈᕹᕴᖃ.shift();
var _ᖘᕹᕴᕸ = _ᖈᕹᕴᖃ[0];
var _ᖂᕺᕸᖂ = _ᖆᕸᕶᖁ.resultAdapt(x);
if ("error" === _ᖂᕺᕸᖂ.status) return (0, _ᖀᕵᕾᖃ.throwError)((0, _ᖀᕵᕾᖃ.getServerError)(x, _ᖆᕸᕶᖁ, "/verify.php"));
_ᖗᖗᖘᕾ ? arr1(_ᖂᕺᕸᖂ.data) : _ᖆᕸᕶᖁ.handleResult(_ᖂᕺᕸᖂ.data, arr1);
}, function () {
var x = var_a.proxy1, _ᖙᕴᕾᕶ = ["$_BIDEJ"].concat(x), _ᖗᖗᖘᕾ = _ᖙᕴᕾᕶ[1];
_ᖙᕴᕾᕶ.shift();
var _ᖈᕹᕴᖃ = _ᖙᕴᕾᕶ[0];
return (0, _ᖀᕵᕾᖃ.throwError)((0, _ᖀᕵᕾᖃ.getError)("url_verify", _ᖆᕸᕶᖁ));
});
- I am bit sorry for pasting this large snippet, but it’s important i felt.
x = {
callback: "",
captcha_id: _ᖙᕴᕾᕶ.captchaId,
challenge: _ᖙᕴᕾᕶ.challenge,
client_type: _ᖙᕴᕾᕶ.clientType,
lot_number: _ᖙᕴᕾᕶ.lotNumber,
risk_type: _ᖙᕴᕾᕶ.riskType,
payload: _ᖙᕴᕾᕶ.payload,
process_token: _ᖙᕴᕾᕶ.processToken,
payload_protocol: _ᖙᕴᕾᕶ.payloadProtocol,
pt: _ᖙᕴᕾᕶ.pt,
w: x
};
- Here, this is the final object that is being sent to the server. The main thing to notice here is
w: x
->the encrypted object
var i = (0, _ᖉᖘᖂᕾ.getStringByIndexes)(_ᖙᕴᕾᕶ.lot, _ᖙᕴᕾᕶ.lotNumber),
o = (0, _ᖉᖘᖂᕾ.getStringByIndexes)(_ᖙᕴᕾᕶ.lotRes, _ᖙᕴᕾᕶ.lotNumber),
r = i.split("."),
i = {};
- Now, we need to see,
how w is being encrypted
? I tried searching withpow_sign
pow_msg
but nothing found. I went back to the two encrypted functions, one wasAES
and other seemed to beRSA
.
1
2
3
4
5
6
7
8
9
10
11
12
13
_ᖗᖗᖘᕾ = function (data, var_mm) {
var _ᖗᖗᖘᕾ = var_a.prodxy1, _ᖈᕹᕴᖃ = ["$_DABJy"].concat(_ᖗᖗᖘᕾ), arr1 = _ᖈᕹᕴᖃ[1];
_ᖈᕹᕴᖃ.shift();
var _ᕸᕺᖂᖗ = _ᖈᕹᕴᖃ[0];
if (!(var_oo = var_mm.options).pt || "0" === var_oo.pt) return _ᖘᕹᕴᕸ.default.urlsafe_encode(data);
var str__1 = (0, _ᕹᕾᕶᖙ.guid)(), _ᕷᖉᕹᖈ = new ("1", "2"), var_mm = {1: {symmetrical: _ᖁᖈᕶᕶ.default, asymmetric: new _ᖆᖆᖘᕴ.default}, 2: {symmetrical: new _ᕿᕶᖆᕺ.default({key: str__1, mode: "cbc", iv: "0000000000000000"}), asymmetric: _ᖉᖘᖂᕾ.default}};
if (_ᕷᖉᕹᖈ.$_CCp(var_oo.pt)) {
var i = "1" === var_oo.pt, var_oo = var_oo.pt, r = var_mm.asymmetric.encrypt(str__1);
while (i && (!r || 256 !== r.length)) str__1 = (0, _ᕹᕾᕶᖙ.guid)(), r = (new _ᖆᖆᖘᕴ.default).encrypt(str__1);
data = var_mm.symmetrical.encrypt(data, str__1);
return (0, _ᕹᕾᕶᖙ.arrayToHex)(data) + r;
}
};
I have bit modified the variable naming. So, i think it is the
encryption for w
: as you see thept
is being passed here. But to make sure, we are in the correct function, we need to verify through the params.. what isdata
?? HereI verified it by setting the breakpoint at this line
, when the slider was done.. the code flow paused and our breakpoint was hit. So, yesencryption routine for w
is verified."{"setLeft":114,"passtime":1359,"userresponse":115.32608753280493,"device_id":"","lot_number":"0c4fc046a1d6443598e4dcfb868f0af6","pow_msg":"1|8|sha256|2025-09-09T05:30:45.059386+08:00|54088bb07d2df3c46b79f80300b0abbe|0c4fc046a1d6443598e4dcfb868f0af6||457cedec1f1049b5","pow_sign":"00593ae56a490c46d087b7e4ad60a7278f74b9a9f97482862061384488ff08c3","geetest":"captcha","lang":"zh","ep":"123","biht":"1426265548","gee_guard":{"roe":{"aup":"3","sep":"3","egp":"3","auh":"3","rew":"3","snh":"3","res":"3","cdc":"3"}},"W4Ec":"7RXi","cfb8":{"6a1d6443":{"d6444dcf":"8f0a"}},"em":{"ph":0,"cp":0,"ek":"11","wd":1,"nt":0,"si":0,"sc":0}}"
This is being passed in the form of data. And this data is our
w
which is being encrypted here. But we need to find out, howw
is being formed.We can see other details too in this image
Till now, I have got the code snippets and other necessary things we need for encryption/decryption
After all this encryption, the data is sent
/verify
end-point, and if it is validated.. then we receive thecaptcha_output
with the wholeseccode
field.I tried using the debugger, but there is maybe antidebugger check which makes the attempt fail which either results in failCount or if delayed with debugging,
TIME_OUT
error is displayed.
Part-2
Now, I am implementing all the functions and algorithms which I have reversed till now, and then we’ll verify if its working.
val = {setLeft: val, passtime: _ᖁᖈᕶᕶ, userresponse: val / _ᕸᕺᖂᖗ.$_BHDP + 2};
when i looked for setleft, passtime, and userresponse, I got this.So this
setLeft
anduserresponse
is dependent on some value. We need to figure out how is it being calculated, and also about passtime, and other values too.More params of w can be seen here.
{
"setLeft": 114,
"passtime": 1359,
"userresponse": 115.32608753280493,
"device_id": "",
"lot_number": "0c4fc046a1d6443598e4dcfb868f0af6",
"pow_msg": "1|8|sha256|2025-09-09T05:30:45.059386+08:00|54088bb07d2df3c46b79f80300b0abbe|0c4fc046a1d6443598e4dcfb868f0af6||457cedec1f1049b5",
"pow_sign": "00593ae56a490c46d087b7e4ad60a7278f74b9a9f97482862061384488ff08c3",
"geetest": "captcha",
"lang": "zh",
"ep": "123",
"biht": "1426265548",
"gee_guard": {
"roe": {
"aup": "3",
"sep": "3",
"egp": "3",
"auh": "3",
"rew": "3",
"snh": "3",
"res": "3",
"cdc": "3"
}
},
"W4Ec": "7RXi",
"cfb8": {
"6a1d6443": {
"d6444dcf": "8f0a"
}
},
"em": {
"ph": 0,
"cp": 0,
"ek": "11",
"wd": 1,
"nt": 0,
"si": 0,
"sc": 0
}
}
From the above params, things we need to decide is what params are always same, i.e static.
I verified this by running the debugger multiple times.
- I got the value of denominator in useresponse that is
1.0059466666666665
I also verified that it is constant setLeft
must be relatd to the slider position andpasstime
should be the time to solve the captcha, and this can be set to some random value between 1s to 3s, and this function measures inms
.How we will be calculating the slider position offset ? <- we will be coming to this in a while.
- Another thing to look for this chunk
1 2 3 4 5
"cfb8": { "6a1d6443": { "d6444dcf": "8f0a" } },
- If we notice carefully, this chunk is generarted from the
lot_number: 0c4fc046a1d6443598e4dcfb868f0af6
. Refer to this:
(n[21:24])+.+(n[7:14])+.+(n[10:13]+n[19:22])":"n[26:29]"
I have already mentioned this code snippet above.- For now, I have left
setleft
. I will come back to it.
Request Part
- Now, I am implementing the request part and fetching all the required params.
Load URL is:
https://gcaptcha4.geetest.com/load?captcha_id=54088bb07d2df3c46b79f80300b0abbe&challenge=b915db78-2bdf-4ece-831d-060052c156cf&client_type=web&risk_type=slide&lang=eng&callback=geetest_1757578032315
we need
- captcha_id [fixed]
- challenge
- client_type [fixed]
- risk_type [fixed]
- lang [fixed]
- callback
When I used this in my script,
callback = f"geetest_{int(time.time())}"
[
"{\"status\":\"error\",\"code\":\"-50004\",\"msg\":\"jsonp xss\",\"desc\":{\"type\":\"defined error\"}}",
"geetest_1757647992"
]
jsonp xss
was thrown: https://stackoverflow.com/questions/8750469/apparent-jsonp-xss-vulnerability- It told me to fix the callback because of potential for XSS vuln.
- How to fix this ? I again went back to
gt4.js
and there i looked forrandom()
function
1
2
3
var random = function () {
return parseInt(Math.random() * 10000) + (new Date()).valueOf();
};
Now, only thing is left to work on
setleft
param.To calculate setleft, I need to calculate the coordinates of the puzzle block in the background image. In the beginning of this md file, I shared one blog that calculates the coordinate of the puzzle piece via
openCV
and with the help of that I was able to compute the setleft param. Once, it is done,captcha is solved
This
seccode
is received and there we can see thecaptcha_output
-> Thus we have solved the captcha.
1
'captcha_output': 'l93Zj4V7B8W_VKlCz_L-i8rIXXS5VQvcXnbDLTmTBNHeIW47CYti7Q1DgZyvCH0gHldaaxJ57eQ9OpFDhMWRg-57PeFOpx6tKzJGuDPLjnWoN_AyZ9Ve3WhPcoxRUtJbOVq-m5_fWAMvKaz6t8xkIdPnoRW80E50QQHJtKJhWzHm1dilL1RD3CP144aHIRj-qqF-xprSb5oRvWPf9U0cUw80yUfE2XZFRObCtxjazvDoXINl70XkKj3cqLg3xlyai-W0wiXC3RQKcUSpqKv_Eb8Zm3ncZw18Ovf4bjfxA9qMIl_MUNQbrmnTARQYEgyzcDwiIrddpHk0AyCpbDqWiH4Iz5deW-2i1YXh4lIGWYpYHEfi3K2UI1sdBQCOwj6EGsBn9LotLygyjfqxzIZQA0PCYeIJUSflk7OBELd6-kE=
- Thanks for your time reading this. Hope you liked it, and sorry for bad formatting of the writeup. :)