Azure is constantly adding new tools to its arsenal and although Azure Data Factory is not new but I have been recently working a lot with it to manage and transform data. In the process I learned a lot about the various capabilities of ADF and also managed to figure out a few tricks. In this series I would share what I have learned so far to help anyone who is taking steps towards a similar journey.

Azure Data Factory allows us to manage the ETL lifecycle for the big data with the flexibility of serverless scaling. Let’s create a Data Factory and I will explain more as we proceed. Navigate to Azure Portal if not already there and search for Data Factories in the search bar. Select Data Factories from the result.

You will be taken to the Data Factories Blade. If you have not created any then you should see a screen like below.

Click on Create Data Factory and you will be taken to the blade like below to enter the details like subscription, resource group, region name and version. As of today V2 is the latest version.

If you like you could enter setup a git repository for the Data Factory. I created a repo and added the details.

The next option is to add tags. I am going to skip that as I am not going to use them in this demo. Once you click next you would see the Review + Create section. If all the validation pass you would be able to Create the Data Factory.

Once you click on create, you would see the deployment starting and once the resource is created you should see something like below.

Click on Go to resource and we will be taken to the Data Factory. You will a lot of details about the Data Factory here just like any other Azure Resource. The couple of like that I would like to point out are

  • Author and Monitor – this will take to another interface where we will be able to create the Pipelines (scheduled data-driven workflows) that would the ETL for us.
  • Documentation – as you already know this would give pretty detailed information about the different aspects of the Data Factories.

I would leave the documentation part up to you to browse through. Let’s navigate to Author and Monitor.

If you added GitHub details while creating the Data Factory then you would be asked to Login to GitHub to provide ADF the necessary permissions.

Once you are done with that you would see a screen like below. The main navigation is in the top left corner of the page. Click on the

This will take to the screen below. Since this the first time we are we do not have anything at the moment. Let’s go ahead and click on the + in the top left corner and select pipeline.

This will create a pipeline and show the Activities toolbar which we will use to drag and drop new activities into the pipeline.

You will also notice that there is yellow indicator with number 1 below the Author Icon. This is because we now have changes that unpublished.

* Your top menu might be a little different if you did not select GitHub for source control.

Let’s add a Copy data activity to the pipeline. We can do this by dragging and dropping the Copy data activity from Move & transform onto the central screen.

We will copy data from a SQL Server database to the blob storage. I am not going to cover the creation of SQL Server Database. If you do not have one then you could take this as an assignment to create one. Now let’s link to the SQL Database under the Source. Click on New.

This will open a flyout on the side and we can select SQL as the new Dataset.

We do not have a linked service yet so let’s go ahead and create one which will let us connect to the Sql database we have.

Once we have saved the dataset and linked service information, we should see something like below in the Source section

We want to copy the data to Blob storage. Just like SQL storage, I am not going to go into details of creating a blob storage. I hope you are able to set that on your own. If you are unable to do so then please comment on the post and I will add the details for it. Let’s select New and create a new dataset for the blob storage and select the Azure Blob Storage.

Select the DelimitedText and select New for the creation of Linked service.

Add the required details for the Azure Blob Storage and make sure you test connection.

Once we have setup the copy data then click on Debug to run the activity.

This will run the pipeline in Debug mode and show the run details in the output tab in the bottom. You would be able to see more details about the run when you click on the glasses icon. Also clicking on the two arrow icons you could see the input and output to this activity.

Also when I browse to the container in the Blob Storage I am able to see the data from the SQL database there.

Hope you found this helpful. Feel free to leave feedback and questions.

WHO has declared Coronavirus a global pandemic and almost every country in the world is impacted by it. A lot of people are trying to track the state of pandemic and it’s impact. A lot of people and organizations have come forward to help in this global crisis. But just like everything else where there is good there is evil. Malicious apps, websites, scams and ransomware have spun up to take advantage of the situation.

One of the ransomeware app it the Covid 19 Tracker which promises to give you the real time tracking of the spreading virus near you.

But in the background changes the password of your android phone and locks it. It then demands $100 in Bitcoin to be paid to be able to unlock your phone.

Be safe and just like always visit only the sites you trust and please refrain from installing untrusted apps on your mobile phone.

Microsoft has delivered the fastest project of this size that I know of to provide accurate and up to date information on the coronavirus (COVID-19). You can visit the live tracker at https://bing.com/covid. The website is mobile friendly as well.

XOR cipher is a simple additive encryption technique in itself but is used commonly in other encryption techniques. The truth table for XOR cipher is as below. If the bits are same then the result is 0 and if the bits are different then the result is 1.

 

Bit 1 Operation Bit 2 Result
0 0 0
1 0 1
0 0 1
1 1 0

 

Let’s take an example. We would encrypt Sun using the key 01010010 repeatedly .

Encryption
Text          |     S    |     u     |     n       |
ACII Code     |    083   |    117    |    110      |
Binary        | 01010011 |  01110101 |  01101110   |
Key           | 01010010 |  01010010 |  01010010   |
Cipher        | 00000001 |  00100111 |  00111100   |

Now if we XOR the cipher with the same key we will get back the out original text.

Decryption
Cipher        | 00000001 |  00100111 |  00111100   |  
Key           | 01010010 |  01010010 |  01010010   |
Output        | 01010011 |  01110101 |  01101110   |
ACII Code     |    083   |    117    |     110     |
Text          |     S    |     u     |      n      |

This encryption we just did was not very secure because used the same key over and over again. To make our encryption more secure we should use a unique key and not the one which is repetitive in nature. A good technique that could be used is One-time Pad. This makes the encryption much more secure to the brute force attack.

XOR encryption and decryption

The encryption and decryption using XOR has the same code. A python implementation for the same is below:

 

input_str = raw_input("Enter the cipher text or plain text: ")
key = raw_input("Enter the key for encryption or decryption: ")
no_of_itr = len(input_str)
output_str = ""


for i in range(no_of_itr):
    current = input_str[i]
    current_key = key[i%len(key)]
    output_str += chr(ord(current) ^ ord(current_key))

print "Here's the output: ", output_str

And here’s a sample run

Image showing sample run of ROT13 encoder decoder

Image showing sample run of XOR encryption and decryption

 

The entire source code for this post can be found at https://github.com/abhishuk85/cryptography-plays

Any questions, comments or feedback are most welcome.

Image showing sample run of ROT13 encoder decoder

ROT13 is a letter substitution cipher and a special case of Caesar Cipher where each character in the plain text is shifted exactly 13 places. If you are not aware of Caesar Cipher then look at Caesar Cipher. For example the cipher for SUN becomes FHA.

The cool thing about this technique is that if we do a ROT13 on the cipher text then we get back the plain text since each letter in the text is shifted by 13 places. For example when we do a ROT13 on FHA we get back SUN

 

A block representation of ROT13 encryption and decryption

A block representation of ROT13 encryption and decryption

 

ROT13 Encoder and Decoder

The encoder and Decoder for ROT13 is the same because there is no special logic during decoding since the shift for both encoding and decoding is the same. Below is the python code for the implementation of it. The code is pretty much the same as Caesar Cipher with the shift value set to 13 always.

 

alphabets = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

input_str = raw_input("Enter message that you would like to encrypt/decrypt using ROT13: ")
shift = 13
no_of_itr = len(input_str)
output_str = ""

for i in range(no_of_itr):
    current = input_str[i]
    location = alphabets.find(current)
    if location < 0:
        output_str += input_str[i]
    else:
        new_location = (location + shift)%26
        output_str += alphabets[new_location]

print "Here's the output: ", output_str

Here’s a sample run of ROT13

Image showing sample run of ROT13 encoder decoder

Image showing sample run of ROT13 encoder decoder

 

The entire source code for this post can be found at https://github.com/abhishuk85/cryptography-plays

Any questions, comments or feedback are most welcome.

Base64 is a binary to text encoding technique rather than an encryption technique but I thought it made sense to cover it in this series because it is widely used especially for transmitting the data over the wire. The reason being the set of characters selected for this encoding is a subset of most common characters in all encoding and printable characters.

Here is the Base64 index table:

Index Char Index Char Index Char Index Char
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 49 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F 21 V 37 l 53 1
6 G 22 W 38 m 54 2
7 H 23 X 39 n 55 3
8 I 24 Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /

 

The conversion of a string into Base64 happens by taking the 8-bit binary equivalent of the alphabets and then slicing it into 6-bit unit since the maximum value in the Base64 is 2^6 and then using the index table like above binary would be represented. Lets take an example of string Sun and see how it would be represented in Base64

 

Text          |     S    |     u     |     n       |
ACII Code     |    083   |    117    |    110      |
Binary        | 01010011 |  01110101 |  01101110   |
6-bit         | 010100 | 110111 | 010101 | 101110  |
Base64 Index  |   20   |    55  |   21   |   46    |
Base64 encoded|    U   |    3   |   V    |    u    |

We can verify this by converting the string with Python

>>> "Sun".encode("base64")
'U3Vu\n'

 

The newline character that we see at the end of the output is ignored. Whether we decode the string with or without the we would still get the same string back

>>> "U3Vu\n".decode("base64")
'Sun'
>>> "U3Vu".decode("base64")
'Sun'

 

The length of characters in the output has to be a multiple of 4. If it is not the case then the output is appended with either one or two “=” to make it so. For example when we convert Earth to Base64 we this in action

>>> "Earth".encode("base64")
'RWFydGg=\n'

 

Base64 Encoder

Sometimes for various reasons the strings are Base64 encoded multiple times and you might have noticed by now this increases the length of the output. The base64 encoder that I wrote using the one builtin with Python takes the number of times you would like to encode your string. The code is pretty straightforward.

 

input_str = raw_input("Enter the string that you like to be base64 encoded:")
times = int(raw_input("How deep do you want it encoded:"))

output_str = input_str

for i in range(times):
    output_str = output_str.encode("base64")

print "Encoded string: ", output_str

 

And here is a sample run

 

Image showing sample run of Base64 encoder

Image showing sample run of Base64 encoder

 

Base64 Decoder

This a where it gets a little bit trickier since while decoding I assume that I am not aware of the number of times the text was encoded. I created a base sting that contains all the valid characters in Base64 encoded strings and then take the input as base64 encoded string

 

base_64_encoding_characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="

input_str = raw_input("Enter the base64 encoded string that you would like to decode: ")

With the string to be decoded in hand we go into a while loop and run in it until we have a potential candidate for the original string. The basic logic is to try and decode the string and if fails to decode then append an “=” to its end and try again and also increase the error count in the process. We repeat this twice and keep going until we have a string that cannot be decoded.

 

while error_count < 3:
    input_str, is_end = ValidateAndSplit(input_str.replace('\n',''))

    if is_end == True:
        break;
    try:
        temp = input_str.decode("base64")
        input_str = temp
        output_str = temp
        depth = depth + 1
        error_count = 0
        print input_str
    except binascii.Error as err:
        error_count = error_count + 1
        input_str = input_str + "="

print "Potential decoded string: ", output_str, "\nWith depth: ", depth

The ValidateAndSplit method basically tries to remove unnecessary charters from the string to make sure we don’t down a bad path and also tells us when potentially we have reached the end of our search

 

def ValidateAndSplit(input_str):
    is_end = False
    n = len(input_str)
    if n < 1:
        is_end = True
        return input_str, is_end

    for i in range(n):
        c = input_str[i]
        location = base_64_encoding_characters.find(c)
        if location < 0 and c == " ":
            is_end = True
            break
        elif location < 0:
            data = input_str.split(c, 1)
            input_str = data[0]
            break

    return input_str, is_end

Here’s a sample run of this decoder with the same base64 string that we encoded before 10 times

 

Image showing sample run of Base64 decoder

Image showing sample run of Base64 decoder

 

The problem with the current approach is that if we might over decode the string that are one word only. One fix to that could be reaching out to reach out to an online dictionary and see that we have found a valid word.

 

The entire source code for this post can be found at https://github.com/abhishuk85/cryptography-plays

Any questions, comments or feedback are most welcome.