The verification code is designed to prevent the computer from automatically filling out the form, verifying that you are a real "person." But with the rise of deep learning and computer vision, they are now often easily defeated.
I've been reading a book written by Adrian Rosebrock "Deep Learning for Computer Vision with Python." In this book, Adrian reviews how to crack the verification code system on e-zpass New York website through machine learning:
Adrian does not have access to the source code of the application that generated the verification code image. To break this system, he had to download hundreds of sample images and manually solve them to train his system.
However, if we want to crack an open source verification code system, where do we go to access the source code?
I visited the WordPress.org plug-in registration site and searched for "CAPTCHA". The above result is called "Really Simple CAPTCHA" and there are over 1 million installations:
WordPress.org plugin registration address: https://wordpress.org/plugins/
The best part is that it has its source code! Because there is source code for generating the verification code, this should be easy to crack. To make things more challenging, let us give ourselves a time limit. Can we crack this verification code system completely in 15 minutes? Let's try it!
Important note: This is by no means a criticism of the "Really Simple CAPTCHA" plugin or its author. Plug-in authors themselves say it is not safe, I suggest you use other things. This is just an interesting and fast technical challenge. But if you are one of the 1 million users, maybe you should be prepared for it :)
challenge
First of all, let us need to know what kind of image the Really Simple CAPTCHA generates. On the demo site, we see:
Really Simple CAPTCHA Address: https://wordpress.org/plugins/really-simple-captcha/
Demo code picture
The captcha image looks like four letters. Let's verify this in the PHP source code:
Public function __construct() { $this->chars = 'ABCDEFGHJKLMNPQRSTUVWXYZ23456789'; $this->char_length = 4;}
Yes, it generates a 4-letter verification code using a random combination of 4 different fonts. We can see that it never uses "O" or "I" in the code to avoid user confusion. This left us with 32 possible letters and numbers.
The time so far: 2 minutes
Our toolset
Before we discuss further, let's discuss the tools needed to solve this problem:
Python 3
Python is an interesting programming language that contains a good library of machine learning and computer vision.
OpenCV
OpenCV is a popular computer vision and image processing framework. We will use OpenCV to process the captcha image. It has a Python API so we can use it directly in Python.
Keras
Keras is a deep learning framework written in Python. It makes it easy to define, train, and use deep neural networks with minimal coding.
TensorFlow
TensorFlow is Google's machine learning library. We will code in Keras, but Keras doesn't really implement the neural network logic itself. Instead, it uses Google's TensorFlow library behind the scenes to complete its heavy tasks.
Well, back to the challenge.
Create a data set
Training any machine learning system requires training data. To break the verification code system, we need such training data:
Now that we have the source code for the WordPress plugin, we can modify it to save 10,000 captcha images and the expected answer for each image.
After breaking through the code for a few minutes and adding a simple for loop, I had a folder containing training data—10,000 PNG files with the correct answer as their file name:
This is the only part of the sample code I don't give you. We do this for teaching, and I don't want you to really go to the spam site. But I will give you the last 10,000 photos I have generated so that you can copy my results.
The time so far: 5 minutes
Simplify the problem
Now that we have the training data, we can use it directly to train the neural network:
If there is enough training data, this method may be effective - but we can make the problem much simpler. The simpler the problem, the less training data, the less computing power we need to solve. We are only 15 minutes after all!
Fortunately, captcha images usually consist of only four letters. If we can separate the images so that each letter is a separate image, then we only need to train the neural network to recognize a single letter:
I don't have time to browse through 10,000 training images and use Photoshop to manually divide them into individual images. It takes a few days, but I have only 10 minutes left. We cannot divide the image into four equal-sized blocks because the verification code randomly places the letters in different horizontal positions, as shown in the following figure:
The letters in each image are placed randomly, making image segmentation more difficult.
Fortunately, we can still achieve automation. In image processing, we often need to detect "blobs" of pixels that have the same color. The boundaries of these consecutive pixels are called contours. OpenCV has a built-in findContours() function that we can use to detect these contiguous areas.
We will start with an original verification code image:
Then we convert the image into pure black and white pixels (this is called the color thresholding method), so that it is easy to find the contour boundary of the continuous area:
Next, we will use OpenCV's findContours() function to detect the separate parts of the image that contain blocks of consecutive pixels of the same color:
Then save each area as a separate image file. Because we know that each image should contain four letters from left to right, we can use this knowledge to mark our saved letters. We save them in this order and save each image letter with the corresponding letter name.
But wait - I found the problem! Sometimes the verification code has such overlapping letters:
This means that we will eventually extract the area that puts two letters together:
If we do not deal with this issue, we will produce bad training data. We need to solve this problem so that we don't accidentally let the machine recognize the two squashed-combined letters as one letter.
There is a simple trick: if the width of an area is greater than its height, it means that we may have two letters squeezed together. In this case, we can put the two letters in the middle and divide it into two separate letters:
Now that we have a way to extract a single letter, let's run it in all the captcha images. The purpose is to collect different variants of each letter. We can keep each letter in its own folder.
Here is a picture of the "W" folder after I extracted all the letters:
The time so far: 10 minutes
Build and train neural networks
Because we only need to recognize single-letter images, we need a very complex neural network structure. Recognizing letters is much easier than identifying complex images like cats and dogs.
We will use a simple convolutional neural network architecture that has two convolutional layers and two fully connected layers:
Defining this neural network architecture requires only a few lines of Keras code:
Model = Sequential() model.add(Conv2D(20, (5, 5), padding="same", input_shape=(20, 20, 1), activation="relu")) model.add(MaxPooling2D(pool_size= (2, 2), strides=(2, 2))) model.add(Conv2D(50, (5, 5), padding="same", activation="relu")) model.add(MaxPooling2D(pool_size= (2, 2), strides=(2, 2))) model.add(Flatten()) model.add(Dense(500, activation="relu")) model.add(Dense(32, activation="softmax ")) model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
Now we can run it.
# Train the neural network model.fit(X_train, Y_train, validation_data=(X_test, Y_test), batch_size=32, epochs=10, verbose=1)
After training the data set 10 times, we achieved close to 100% accuracy. We should be able to automatically bypass this verification code whenever we need it.
Time passed: 15 minutes
Use the trained model to solve the verification code
Now that we have a trained neural network, using it to crack the verification code is simple:
1. Get a real captcha image from the WordPress plugin's website.
2. Split the captcha image into four different alphabetic images using the method we used to create the training dataset.
3. Let our neural network make a separate prediction for each letter image.
4. Use the four prediction letters as the answer to the verification code.
Here is how our model decodes the real verification code:
Or from the command line:
ZGAR AZ Vape Pods 5.0S
ZGAR electronic cigarette uses high-tech R&D, food grade disposable pod device and high-quality raw material. All package designs are Original IP. Our designer team is from Hong Kong. We have very high requirements for product quality, flavors taste and packaging design. The E-liquid is imported, materials are food grade, and assembly plant is medical-grade dust-free workshops.
From production to packaging, the whole system of tracking, efficient and orderly process, achieving daily efficient output. WEIKA pays attention to the details of each process control. The first class dust-free production workshop has passed the GMP food and drug production standard certification, ensuring quality and safety. We choose the products with a traceability system, which can not only effectively track and trace all kinds of data, but also ensure good product quality.
We offer best price, high quality Pods, Pods Touch Screen, Empty Pod System, Pod Vape, Disposable Pod device, E-cigar, Vape Pods to all over the world.
Much Better Vaping Experience!
ZGAR AZ Pods 5.0S Pods,ZGAR AZ Vape Pods 5.0S,ZGAR AZ Vape Pods 5.0S Pod System Vape,ZGAR AZ Vape Pods 5.0S Disposable Pod Vape Systems
ZGAR INTERNATIONAL(HK)CO., LIMITED , https://www.zgarvapepen.com