Investigating Digital Crimes using Python - PyConES 2020

Oct 3, 2020 08:30 · 2391 words · 12 minute read set particular back end associated

Hello everyone my name is Gajendra Deshpande i am working as a student professor at KLS Gogte of Technology India today i will be delivering a talk on investigating digital crimes using python so in today’s talk i will be discussing in brief about introduction to digital crimes digital foreign investigation process and some modules related to python such as pyscreenshot pypdf etc then we’ll see about investigation of emails and investigation of embedded metadata and finally the conclusion so let’s see some cyber crime statistics but before that let’s define the cyber crimes so cyber crime is nothing but the crime done using digital devices and gadgets such as laptops mobile phones and gadgets etc the internet crime report for 2019 released by usa’s internet crime complaint center IC3 of fbi has revealed the top four countries that are victims of internet crimes so they are usa uk canada and followed by india so according to rsa report of 2015 mobile transactions are rapidly growing and cyber criminals are migrating to less protected soft channels that is because not many people are aware of security and privacy settings of mobile phones according to report by norton 2015 an estimated 113 million indians lost about rupees 16 558 rupees that is almost equal to 250 on an average to cyber crime according to an article published in Indian Express on 19th november 2016 or 55 million millennials in india are hit by cyber crime a recent study by checkpoint research has recorded over 150 000 cyber attacks every week during covid19 pandemic there has been an increase of 30 percent in cyber attacks compared to previous weeks now let’s define the field of forensic science so forensic science is the use of scientific methods or expertise to investigate crimes or examine evidence that might be presented in the court of law cyber forensics is the investigation of various crimes happening in the cyberspace examples of cyber attacks include phishing ransomware fake news fake medicine extortion and insider frauds note here that your cyber attacks can be classified into external attacks and insider attacks now note here that insider shots or insider attacks are more dangerous because here the unhappy employees may be involved in revealing the confidential information to the outsiders and this information can be further used to carry out the attacks then according to digital phone six research workshop digital forensics can be defined as the use of scientifically derived and proven method toward the preservation collection validation identification analysis interpretation documentation and presentation of digital evidence derived from digital sources for the purpose of facilitating or furthering the reconstruction of events found to be criminal or helping to anticipate unauthorized actions shown to be disruptive to planned operations now digital phone 6 investigation process has 6 steps identification collection validation examination preservation and final representation so let’s discuss in brief each of these steps so in identification step the investigation officer visits the crime location and he needs he will identify the different devices which he will be seizing for the investigation process so it may include mobile laptop computers then various gadgets then different parts of computers such as hard disk then network cables usb pen drives etc then next step is the collection of evidence now when the initiation officer visits the crime location the system may be on and note here that he needs to take the picture of the system state and collect the evidence from the on system so he he should not switch off the system so the process of collecting the information from the vigilance launch system at a crime location is known as live forensics and it is very very important so if you switch off the system its state will change then the data may be lost note here that some evidence may be present in the volatile memory so you your evidence can be classified into a volatile evidence and non- volatile evidence so in this process the investigation officer has to first collect the volatile evidence because if he switches off the system that will change the system state and the evidence may be lost similarly the system is off then he should not turn on the system so that’s a very very important step because that may alter the system state and there may be a loss of information and this may not be accepted as an evidence in the court of law then third one is the validation now note here that the investigation officer is taking this snapshot of a system they are taking the image of a system now note here that they cannot perform investigation on the original data so they need to make the copy of the data when they make the copy of the data they should ensure that it is the exact copy of the data in this case they can use the cryptographic hash functions to match and to verify that the both original and the copy of the data are same then in the fourth process they are going to use - investigation officers are going to use various tools they may also write python scripts or any other programming language script for various purposes so the main goal of examination step is to find the evidence the next is the preservation step - note here that whatever uh component equipments digital cases you see need to be protected they need to be kept in a proper place for example if there’s a hard disk then that hard disk has to be placed in the standard bags known as paradise bag and it has to be placed in a locker at a proper place in a proper security so that the information should not be altered then finally the presentation so in this step the investigation officer will present the evidence in the court of law note here that there’s a standard procedure and investigation officer has to follow the standard procedure if they don’t follow the standard procedure then that may not be accepted as the evidence in the court of law then some python modules or some python packages for digital forensics are pyscreenshot which takes the screenshot of the screen then quopri which does the encoding and decoding of the mime information so mime is nothing but the multi-purpose internet mail extensions which are used while sending and receiving the emails the next is the mutagen it’s a python module to handle audio metadata then pypdf2 it’s a pdf toolkit again it handles the metadata associated with the pdf files then finally the pefile so it basically here the it’s a multi- platform python module to parse and work with portable executable files now note here that all these python packages they are pure python packages that means they don’t need any dependency then let’s first discuss the pyscreenshot it tries to allow to take screenshots without installing third-party libraries it’s a cross-platform but mainly useful for linux-based distributions and note it’s a cross-platform wrapper that means there are various libraries so you can see the instruction command which mentions the pillow library so that means in this example pyscreenshot will work as a wrapper for pillow library which is image processing library in python so this is a code example which basically takes the screenshot of the entire screen so first step is to import the pyscreenshot module then use the grab method then use grab method to capture the entire screen then save method should be used to save the screenshot then similarly you can also take the screenshot of part of a screen so in that case you need to specify the coordinates x1 y1 and x2 y2 now you can also go for performance but performance is not the goal of pyscreenshot so the main goal of goal is to collect the evidence but if you are concerned with the performance then you can use the pyscreenshot.speedtest module so it basically gives the performance related information related to various packages now when you run this command it will basically take 10 scripts screenshots and note the times for taking those 10 screenshots now if you want you can optimize it by disabling the child processes so in this case it will take less time with respect to some libraries so depending on your requirement you can choose the appropriate backend library so if you want to set particular back end so instead of using pillow if you want scort then you can specify here in grab method and similarly you disable child process so when your backend is a message and child processor disabled this code will give you better performance the next is email investigation so you can see here in my slide this is the fake email i have got i don’t have any netflix account but it says that there is a statement and i need to make a payment and you can also observe here that it’s a fake email clearly because this email is not from a genuine netflix account and similarly there is some business offers i don’t know this person but i have received a business offer maybe some maybe to settle the amount of 15.5 million again it’s a fake email so such image you can identify using open source tools there are commercial tools are also available but if you want you can also write the python code now for email investigation the investigator has the following goals that is to identify the main criminal then to collect the necessary evidences then to present the findings and finally build the case then the challenges in email forensics are fake fake emails spoofing and anonymous re-emailing in anonymous emailing what happens is the server drops the identification information and it just sends the email content then there are some techniques used in email institution widely used technique is the header analysis then server investigation then finally network device investigation apart from this there are other techniques also there now in this code what is happening is the access the headers and body content attachments and other payload information is extracted now in this code we are extracting the message body content by using the get payload method then finally we are checking the content of the mime type so that it can handle the storage of the email properly so mimes stands for multi-purpose internet mail extensions so when we deal with mime content we are dealing with different types of data we are dealing with text content html content image content and the audio video content so when we store the different files such as pdf we need more storage so that can be handled if you are checking the content of the mail now you can also extract the attachments using this code now you using the following code or using the code mentioned in this slide you can extract the message body now after accepting the information the important step is to compare it with the original information and check whether it is the fake information or it is the original information the next step is metadata forensics so we know that the metadata is associated with every type of file bit text file audio file image file source code file or any so mutagen is the python module which can handle audio metadata so it supports various types of auto types it can be mp4 flac mp3 org format etcetera etcetera so it works with python 3.6 you can find more information on mutagen mutagen libraries read docs website so you can install it using pip command now in this code the file function takes any file as the input and guesses its type and returns the file type instance now in this code the length and bitrate of the mp3 file is extracted then similarly if you if you want to deal with the metadata of pdf file you need a separate module that is pypdf2 again it’s a pure python library built on pdf toolkit so it is capable of extracting the document information splitting the documents page by page merging documents page by page cropping pages merging multiple pages into single page encrypting and decrypting pdf files and it is very useful tool for websites that manage or manipulate pdf files so again as in case of email investigation this is the first step that is the exiting information after extracting the information you need to perform some kind of string processing or document processing operation or you can also use advanced technology such as machine learning or natural language processing to extract the information and to find out the evidence the next is the pefile that is nothing but the portable executable file it’s a multi-platform python module to parse and work with portable executable files so most of the information contained in the pefile header is accessible as well as all sections details and data so you can also deal with exe files and dot dll files and extract their metadata using pefile module and then later this information can be used for investigation then some of the tasks that pefile makes possible are in inspecting headers analyzing of sections data retrieving embedded data reading strings from the resources watering for suspicious and malformed values overwriting fields should be should mostly be safe then packe detection with peid signatures then peid signature generation so if you observe the functionality of most of the libraries it’s same so it basically extracts the additional information and this additional information can be used for further investigation so maybe we can compare it with original files we can check whether it has changed when it has changed whether it can consist of any extra information which gives the hints about the criminals or the data then the conclusion is it is very important to follow the standard procedure led by law enforcement agencies during investigation process there are many open source as well as commercial tools for digital forensics but learning to develop your own tool is advantageous then many tools written in python are pure python implementations so they don’t have any dependencies thank you .