Home Reversing GeeTest Captcha
Post
Cancel

Reversing GeeTest Captcha

Hi again, In this blog, I will discuss my apporach how I reversed and solved the geetest captcha.

Goal

  • Goal is to solve the geetest captcha programmatically and also refraining from the use of any automation tools.

  • I first tried to understand how this captcha works and got some approaches but they involved the automation of the browser at the final stage.

  • The most common approach was:
    • using the OpenCV and getting the edges of the piece (that is meant to be put in the hole or space given in the image) and the hole.
    • calculating the offset between the two and accordingly set your mouse cursor (in the script) and,
    • finally, you need to use an automation tool to do the final task.. that is sliding the cursor (only a type of catcha, there are other types of puzzles too..)

    • I am sharing the links of the approaches I found:
      1. Geetest Captcha Solver with JS
      2. Datadome Geetest Captcha Solver
  • Since, it was told to solve it programmatically, and avoid the usage of anykind of opensource solver and automation tool. I decided to deep dive at the root level and tried figuring out the working of this captcha. [Though there was a time constraint for this, but I thought it can be a worthy try]

  • Please refer to my repo for referring the below mentioned files. Repo

Part-1

  • I opened the developers tool and inspected the files which were loaded during the captcha and also when the captcha was verified.

load

  • when the captcha is solved.. verify

  • The interesting thing to look is captcha_output that we need to reverse engineer how is it being generated and for this we need to check files and reverse the logic behind it.

  • As we can see there are many files related to gcaptcha.... I went through all the files and tried to understand them.

  • There is a one file, which was heavily obfuscated, (code size is approximately 11k+ lines due to the obfuscation) and that was enough to press a panic button.

  • I tried beautifying the .js scripts using some online tools like https://beautifier.io/

  • I searched some relevant keywords in the code and wanted to verify if I am working on the correct file.

  • I looked at the first function in gcaptcha4.js file. It was obfuscated through control-flattening since we can see the dispatcher and many branches here. I tried reversing manually since it was pretty clear that it is decrypting the long URI string with the key (0ZZi2, and the fucntion takes the index (after decrypting the string that is stored as array), and the array returns the result stored at that index

  • Look at decode_func1.js and the result is stored in decrypted_output_1.txt and decrypted_output.txt

  • Now moving forward, next function was also involved control flattening but i think it didn;t of much help. But the major portion of the code is obfuscated the other way.

  • I tried many online deobfuscators but they were not of any help. [Here i wasted my lot of time in searching for the deobfuscators]

  • Then, I shifted to another hercules task, that is: writing my own deobfuscator for this. I tried refactoring the code and taking the help from https://astexplorer.net/ [understanding its AST structure] …. Again here, I wasted a lot of time because I still didn’t get any good results.. Around 3 hours..

  • Then I came to last option and that was reversing the code manaually. Reversing the code manually took a lot of time but it was giving some results atleast.

  • I used the regex.py (provided) to replace all those array indexing parts with the bigger array, which i earlier decoded and stored. So, till now, the code is bit cleaned up and also bit understandable now.

 function detectAutomation(env) {
    const checks = {};
    //check for PhantomJS
    checks.phantom = ("_phantom" in env || "callPhantom" in env) ? true : false;
    // check for WebDriver
    checks.webdriver = ("webdriver" in navigator) ? true : false;
    //  check for Nightmare.js
    checks.nightmare = ("__nightmare" in env) ? true : false;
    // Check for ChromeDriver artifacts
    checks.cdc = ("$cdc_asdjflasutopfhvcZLmcfl_" in env) ? true : false;
    // Try object descriptor test
    try {
        const desc = Object.getOwnPropertyDescriptor(navigator, "webdriver");
        checks.webdriverDescriptor = typeof desc === "object";
    } catch (e) {
        checks.webdriverDescriptor = false;
    }
    const susSubstrings = ["ph", "cp", "ek", "wd", "nt", "si", "sc"];
    for (const key of Object.keys(env)) {
        if (susSubstrings.some(sig => key.includes(sig))) {
        checks["sus:" + key] = true;
    }
  }
  return checks;
}
  • This script basically checks some automation libraries and frameworks. Basically, a bot detection snippet.

  • Now, my focus is to find the logic for those params that we saw when captcha is verified. How the encryption is being handled, which cryptographic algorithm is it using?

  • Lines from 2500-2900 are handling captcha's events and handlers. I am unable to paste the large chunk of code here.

key

This was something very interesting for me, a public key it seemed. I wanted to confirm the Algo first. It was of 129 bytes.

1
2
>>>> int("10001",16)
65537  <--- this is famous number, e in `RSA algorithm`
  • The other params like, n, e, d, p, q they are of mostly like RSA params. Though we are not sure that this captcha is using this.. because i see other algos too while going through the code. Because RSA is a symmetric crypto algo To decrypt something we still need the private key.

  • This code snippet is listening and monitoring our mouse moves.

  var u = 0, c = function x(_ᖙᕴᕾᕶ) {
            var _ᖗᖗᖘᕾ = var_a.proxy1, _ᖈᕹᕴᖃ = ["$_DCIJm"].concat(_ᖗᖗᖘᕾ), arr1 = _ᖈᕹᕴᖃ[1];
            _ᖈᕹᕴᖃ.shift();
            var _ᖘᕹᕴᕸ = _ᖈᕹᕴᖃ[0];
            if (256 <= (u = u || 0) || _ᕸᕺᖂᖗ <= a) window.removeEventListener ? (u = 0, window.removeEventListener("mousemove", x, false)) : window.detachEvent && (u = 0, window.detachEvent("onmousemove", x)); else try {
              var s = _ᖙᕴᕾᕶ.x + _ᖙᕴᕾᕶ.y;
              _ᖆᕸᕶᖁ[a++] = 255 & s, u += 1;
            } catch (e) {}

aes

  • Looks like, we have got the AES with its iv:0000000000000000 and CBC mode. key may be var str__1 = (0, _ᕹᕾᕶᖙ.guid)() some random string.. though not sure at this point.

  • CONSISTENT: u = "3", UNSURE: "2", INCONSISTENT: _ = "1", TESTS: a = {PHANTOM_UA: "aup", PHANTOM_PROPERTIES: "sep", PHANTOM_LANGUAGE: "egp", HEADCHR_UA: "auh", WEBDRIVER: "rew", HEADCHR_PERMISSIONS: "snh", SELENIUM_DRIVER: "res", CDC: "cdc"}}); <- Some useful data [Lines 4300 to 4500 they are monitroing the browser details and other frameworks here]

  • Lines 3600 to 4000 around, they involve some crypto algos implememtations, like crc32, base64, shifts, padding and many others…

lot

  • I was stuck here at this code snippet, it is having pow_msg device_id lot_number pow_sign: what are these ? They must be involved in encryption part.

  • parseLotString getStringByIndexes : looked more at these… Look at parselotString.js file (provided) and also getStringByIndexes.js func, their deobfuscated versions…

pow It needs: pow_detail.hashfunc pow_detail.bits pow_detail.version pow_detail.dateTime lot_Number captchID `` These must be needed to generate the pow_msg and pow_sign

  • Now i searched for the keyword pow_sign and I found this snippet. pow_gen

  • I have messed up with the variable naming, I am very sorry for that. but i can guess the params here.
  • var a, _ = bits % 4, u = parseInt(bits / 4, 10), c = (a = "0", new Array(u + 1).join(a)), h = version + "|" + bits + "|" + hashfunc + "|" + datetime + "|" + captchaID + "|" + lotNumber + "|" + emptystring + "|"; Used the order of params from the previous figure -> It’s confirmed that this function is for pow_sign and pow_gen.

  • Another important highlighted code snippet,
  function () {
    var x = var_a.proxy1, _ᖙᕴᕾᕶ = ["$_CBJD"].concat(x), _ᖗᖗᖘᕾ = _ᖙᕴᕾᕶ[1];
    _ᖙᕴᕾᕶ.shift();
    var _ᖈᕹᕴᖃ = _ᖙᕴᕾᕶ[0];
    var arr1 = "undefined" != typeof self ? self : "undefined" != typeof global ? global : this;
    arr1._lib = {"W4Ec": "7RXi"}, arr1.lib = arr1.lib || {}, arr1.lib._abo = {"(n[21:24])+.+(n[7:14])+.+(n[10:13]+n[19:22])":"n[26:29]"};
  }()
  • Looks like pattern for string slicing or indexing. I need to confirm if it is used in parselotString() becuase in that function too +.+ . such slicing was involved. ` {“W4Ec”: “7RXi”} this seems to be interesting... let's dig further. We can see window._lib since _lib is stored on the global window` object
 
 case var_a.proxy2()[0][10]:
                (0, _ᖉᖘᖂᕾ.$_BBL)(x, {gee_guard: _ᖙᕴᕾᕶ.geeGuard}), (0, _ᖉᖘᖂᕾ.$_BBL)(x, window._lib || {});
                var i = (0, _ᖉᖘᖂᕾ.getStringByIndexes)(_ᖙᕴᕾᕶ.lot, _ᖙᕴᕾᕶ.lotNumber), o = (0, _ᖉᖘᖂᕾ.getStringByIndexes)(_ᖙᕴᕾᕶ.lotRes, _ᖙᕴᕾᕶ.lotNumber), r = i.split("."), i = {};
                r.reduce(function (x, _ᖙᕴᕾᕶ, _ᖗᖗᖘᕾ) {
                  var _ᖈᕹᕴᖃ = var_a.proxy1, arr1 = ["$_BICED"].concat(_ᖈᕹᕴᖃ), _ᕸᕺᖂᖗ = arr1[1];
                  arr1.shift();
                  var _ᖆᕸᕶᖁ = arr1[0];
                  return _ᖗᖗᖘᕾ === r.length - 1 ? x[_ᖙᕴᕾᕶ] = o : x[_ᖙᕴᕾᕶ] || (x[_ᖙᕴᕾᕶ] = {}), x[_ᖙᕴᕾᕶ];
                }, i), (0, _ᖉᖘᖂᕾ.$_BBL)(x, i), x.em = {}, (0, _ᖂᕶᖃᖂ.default)([], x.em);
                x = (0, _ᖆᖚᕾᕺ.default)(_ᕸᕿᖂᖁ.default.stringify(x), _ᖆᕸᕶᖁ), x = {callback: "", captcha_id: _ᖙᕴᕾᕶ.captchaId, challenge: _ᖙᕴᕾᕶ.challenge, client_type: _ᖙᕴᕾᕶ.clientType, lot_number: _ᖙᕴᕾᕶ.lotNumber, risk_type: _ᖙᕴᕾᕶ.riskType, payload: _ᖙᕴᕾᕶ.payload, process_token: _ᖙᕴᕾᕶ.processToken, payload_protocol: _ᖙᕴᕾᕶ.payloadProtocol, pt: _ᖙᕴᕾᕶ.pt, w: x};
                (_ᖆᕸᕶᖁ.extraData && "android" === _ᖙᕴᕾᕶ.clientType || "ios" === _ᖙᕴᕾᕶ.clientType && !_ᖙᕴᕾᕶ.post) && (x.GeeToken = _ᖆᕸᕶᖁ.extraData && _ᖆᕸᕶᖁ.extraData.GeeToken ? _ᖆᕸᕶᖁ.extraData.GeeToken : null), !_ᖙᕴᕾᕶ.checkDevice && x.GeeToken && delete x.GeeToken, (0, _ᕹᕾᕶᖙ.jsonp)(_ᖙᕴᕾᕶ, "verify", x, _ᖗᕺᕴᖁ).$_JAH(function (x) {
                  var _ᖙᕴᕾᕶ = var_a.proxy1, _ᖈᕹᕴᖃ = ["$_BICJi"].concat(_ᖙᕴᕾᕶ), _ᕸᕺᖂᖗ = _ᖈᕹᕴᖃ[1];
                  _ᖈᕹᕴᖃ.shift();
                  var _ᖘᕹᕴᕸ = _ᖈᕹᕴᖃ[0];
                  var _ᖂᕺᕸᖂ = _ᖆᕸᕶᖁ.resultAdapt(x);
                  if ("error" === _ᖂᕺᕸᖂ.status) return (0, _ᖀᕵᕾᖃ.throwError)((0, _ᖀᕵᕾᖃ.getServerError)(x, _ᖆᕸᕶᖁ, "/verify.php"));
                  _ᖗᖗᖘᕾ ? arr1(_ᖂᕺᕸᖂ.data) : _ᖆᕸᕶᖁ.handleResult(_ᖂᕺᕸᖂ.data, arr1);
                }, function () {
                  var x = var_a.proxy1, _ᖙᕴᕾᕶ = ["$_BIDEJ"].concat(x), _ᖗᖗᖘᕾ = _ᖙᕴᕾᕶ[1];
                  _ᖙᕴᕾᕶ.shift();
                  var _ᖈᕹᕴᖃ = _ᖙᕴᕾᕶ[0];
                  return (0, _ᖀᕵᕾᖃ.throwError)((0, _ᖀᕵᕾᖃ.getError)("url_verify", _ᖆᕸᕶᖁ));
                });
  • I am bit sorry for pasting this large snippet, but it’s important i felt.
 
 x = {
    callback: "",
    captcha_id: _ᖙᕴᕾᕶ.captchaId,
    challenge: _ᖙᕴᕾᕶ.challenge,
    client_type: _ᖙᕴᕾᕶ.clientType,
    lot_number: _ᖙᕴᕾᕶ.lotNumber,
    risk_type: _ᖙᕴᕾᕶ.riskType,
    payload: _ᖙᕴᕾᕶ.payload,
    process_token: _ᖙᕴᕾᕶ.processToken,
    payload_protocol: _ᖙᕴᕾᕶ.payloadProtocol,
    pt: _ᖙᕴᕾᕶ.pt,
    w: x
 };

w

  • Here, this is the final object that is being sent to the server. The main thing to notice here is w: x -> the encrypted object
 
  var i = (0, _ᖉᖘᖂᕾ.getStringByIndexes)(_ᖙᕴᕾᕶ.lot, _ᖙᕴᕾᕶ.lotNumber), 
                o = (0, _ᖉᖘᖂᕾ.getStringByIndexes)(_ᖙᕴᕾᕶ.lotRes, _ᖙᕴᕾᕶ.lotNumber), 
                r = i.split("."), 
                i = {};
  • Now, we need to see, how w is being encrypted? I tried searching with pow_sign pow_msg but nothing found. I went back to the two encrypted functions, one was AES and other seemed to be RSA.
1
2
3
4
5
6
7
8
9
10
11
12
13
 _ᖗᖗᖘᕾ = function (data, var_mm) {
        var _ᖗᖗᖘᕾ = var_a.prodxy1, _ᖈᕹᕴᖃ = ["$_DABJy"].concat(_ᖗᖗᖘᕾ), arr1 = _ᖈᕹᕴᖃ[1];
        _ᖈᕹᕴᖃ.shift();
        var _ᕸᕺᖂᖗ = _ᖈᕹᕴᖃ[0];
        if (!(var_oo = var_mm.options).pt || "0" === var_oo.pt) return _ᖘᕹᕴᕸ.default.urlsafe_encode(data);
        var str__1 = (0, _ᕹᕾᕶᖙ.guid)(), _ᕷᖉᕹᖈ = new ("1", "2"), var_mm = {1: {symmetrical: _ᖁᖈᕶᕶ.default, asymmetric: new _ᖆᖆᖘᕴ.default}, 2: {symmetrical: new _ᕿᕶᖆᕺ.default({key: str__1, mode: "cbc", iv: "0000000000000000"}), asymmetric: _ᖉᖘᖂᕾ.default}};
        if (_ᕷᖉᕹᖈ.$_CCp(var_oo.pt)) {
          var i = "1" === var_oo.pt, var_oo = var_oo.pt, r = var_mm.asymmetric.encrypt(str__1);
          while (i && (!r || 256 !== r.length)) str__1 = (0, _ᕹᕾᕶᖙ.guid)(), r = (new _ᖆᖆᖘᕴ.default).encrypt(str__1);
          data = var_mm.symmetrical.encrypt(data, str__1);
          return (0, _ᕹᕾᕶᖙ.arrayToHex)(data) + r;
        }
      };
  • I have bit modified the variable naming. So, i think it is the encryption for w : as you see the pt is being passed here. But to make sure, we are in the correct function, we need to verify through the params.. what is data ?? Here I verified it by setting the breakpoint at this line, when the slider was done.. the code flow paused and our breakpoint was hit. So, yes encryption routine for w is verified.

  • "{"setLeft":114,"passtime":1359,"userresponse":115.32608753280493,"device_id":"","lot_number":"0c4fc046a1d6443598e4dcfb868f0af6","pow_msg":"1|8|sha256|2025-09-09T05:30:45.059386+08:00|54088bb07d2df3c46b79f80300b0abbe|0c4fc046a1d6443598e4dcfb868f0af6||457cedec1f1049b5","pow_sign":"00593ae56a490c46d087b7e4ad60a7278f74b9a9f97482862061384488ff08c3","geetest":"captcha","lang":"zh","ep":"123","biht":"1426265548","gee_guard":{"roe":{"aup":"3","sep":"3","egp":"3","auh":"3","rew":"3","snh":"3","res":"3","cdc":"3"}},"W4Ec":"7RXi","cfb8":{"6a1d6443":{"d6444dcf":"8f0a"}},"em":{"ph":0,"cp":0,"ek":"11","wd":1,"nt":0,"si":0,"sc":0}}"

  • This is being passed in the form of data. And this data is our w which is being encrypted here. But we need to find out, how w is being formed.

  • We can see other details too in this image

    debug

  • Till now, I have got the code snippets and other necessary things we need for encryption/decryption

  • After all this encryption, the data is sent /verify end-point, and if it is validated.. then we receive the captcha_output with the whole seccode field.

  • I tried using the debugger, but there is maybe antidebugger check which makes the attempt fail which either results in failCount or if delayed with debugging, TIME_OUT error is displayed.

Part-2

  • Now, I am implementing all the functions and algorithms which I have reversed till now, and then we’ll verify if its working.

  • val = {setLeft: val, passtime: _ᖁᖈᕶᕶ, userresponse: val / _ᕸᕺᖂᖗ.$_BHDP + 2}; when i looked for setleft, passtime, and userresponse, I got this.

  • So this setLeft and userresponse is dependent on some value. We need to figure out how is it being calculated, and also about passtime, and other values too.

  • More params of w can be seen here.

 {
    "setLeft": 114,
    "passtime": 1359,
    "userresponse": 115.32608753280493,
    "device_id": "",
    "lot_number": "0c4fc046a1d6443598e4dcfb868f0af6",
    "pow_msg": "1|8|sha256|2025-09-09T05:30:45.059386+08:00|54088bb07d2df3c46b79f80300b0abbe|0c4fc046a1d6443598e4dcfb868f0af6||457cedec1f1049b5",
    "pow_sign": "00593ae56a490c46d087b7e4ad60a7278f74b9a9f97482862061384488ff08c3",
    "geetest": "captcha",
    "lang": "zh",
    "ep": "123",
    "biht": "1426265548",
    "gee_guard": {
        "roe": {
            "aup": "3",
            "sep": "3",
            "egp": "3",
            "auh": "3",
            "rew": "3",
            "snh": "3",
            "res": "3",
            "cdc": "3"
        }
    },
    "W4Ec": "7RXi",
    "cfb8": {
        "6a1d6443": {
            "d6444dcf": "8f0a"
        }
    },
    "em": {
        "ph": 0,
        "cp": 0,
        "ek": "11",
        "wd": 1,
        "nt": 0,
        "si": 0,
        "sc": 0
    }
 }
  • From the above params, things we need to decide is what params are always same, i.e static.

  • I verified this by running the debugger multiple times.

user val

  • I got the value of denominator in useresponse that is 1.0059466666666665 I also verified that it is constant
  • setLeft must be relatd to the slider position and passtime should be the time to solve the captcha, and this can be set to some random value between 1s to 3s, and this function measures in ms.
  • How we will be calculating the slider position offset ? <- we will be coming to this in a while.

  • Another thing to look for this chunk
    1
    2
    3
    4
    5
    
    "cfb8": {
          "6a1d6443": {
              "d6444dcf": "8f0a"
          }
      },
    
  • If we notice carefully, this chunk is generarted from the lot_number: 0c4fc046a1d6443598e4dcfb868f0af6.
  • Refer to this: (n[21:24])+.+(n[7:14])+.+(n[10:13]+n[19:22])":"n[26:29]" I have already mentioned this code snippet above.

    parse

  • For now, I have left setleft. I will come back to it.

Request Part

  • Now, I am implementing the request part and fetching all the required params.

load_

  • Load URL is: https://gcaptcha4.geetest.com/load?captcha_id=54088bb07d2df3c46b79f80300b0abbe&challenge=b915db78-2bdf-4ece-831d-060052c156cf&client_type=web&risk_type=slide&lang=eng&callback=geetest_1757578032315

  • we need

    • captcha_id [fixed]
    • challenge
    • client_type [fixed]
    • risk_type [fixed]
    • lang [fixed]
    • callback

uuid

  • This is for generating challenge id (in gt4.js) callback

  • For headers, I used the same as on the page headers

  • When I used this in my script, callback = f"geetest_{int(time.time())}"

   [
    "{\"status\":\"error\",\"code\":\"-50004\",\"msg\":\"jsonp xss\",\"desc\":{\"type\":\"defined error\"}}",
    "geetest_1757647992"
   ]
  
  • jsonp xss was thrown: https://stackoverflow.com/questions/8750469/apparent-jsonp-xss-vulnerability
  • It told me to fix the callback because of potential for XSS vuln.
  • How to fix this ? I again went back to gt4.js and there i looked for random() function
1
2
3
  var random = function () {
    return parseInt(Math.random() * 10000) + (new Date()).valueOf();
   };
  • Without setleft we are receing the failure message. failCount

  • Now, only thing is left to work on setleft param.

  • To calculate setleft, I need to calculate the coordinates of the puzzle block in the background image. In the beginning of this md file, I shared one blog that calculates the coordinate of the puzzle piece via openCV and with the help of that I was able to compute the setleft param. Once, it is done, captcha is solved

    success

  • This seccode is received and there we can see the captcha_output -> Thus we have solved the captcha.

1
'captcha_output': 'l93Zj4V7B8W_VKlCz_L-i8rIXXS5VQvcXnbDLTmTBNHeIW47CYti7Q1DgZyvCH0gHldaaxJ57eQ9OpFDhMWRg-57PeFOpx6tKzJGuDPLjnWoN_AyZ9Ve3WhPcoxRUtJbOVq-m5_fWAMvKaz6t8xkIdPnoRW80E50QQHJtKJhWzHm1dilL1RD3CP144aHIRj-qqF-xprSb5oRvWPf9U0cUw80yUfE2XZFRObCtxjazvDoXINl70XkKj3cqLg3xlyai-W0wiXC3RQKcUSpqKv_Eb8Zm3ncZw18Ovf4bjfxA9qMIl_MUNQbrmnTARQYEgyzcDwiIrddpHk0AyCpbDqWiH4Iz5deW-2i1YXh4lIGWYpYHEfi3K2UI1sdBQCOwj6EGsBn9LotLygyjfqxzIZQA0PCYeIJUSflk7OBELd6-kE=
  • Thanks for your time reading this. Hope you liked it, and sorry for bad formatting of the writeup. :)
This post is licensed under CC BY 4.0 by the author.