Investigation Of Log Based Artifacts

Till now, we have seen how to obtain artifacts in Windows using Python. In this chapter, let us learn about investigation of log based artifacts using Python.

Introduction

Log-based artifacts are the treasure trove of information that can be very useful for a digital forensic expert. Though we have various monitoring software for collecting the information, the main issue for parsing useful information from them is that we need lot of data.

Various Log-based Artifacts and Investigating in Python

In this section, let us discuss various log based artifacts and their investigation in Python −

Timestamps

Timestamp conveys the data and time of the activity in the log. It is one of the important elements of any log file. Note that these data and time values can come in various formats.

The Python script shown below will take the raw date-time as input and provides a formatted timestamp as its output.

For this script, we need to follow the following steps −

First, set up the arguments that will take the raw data value along with source of data and the data type.
Now, provide a class for providing common interface for data across different date formats.

Python Code

Let us see how to use Python code for this purpose −

First, import the following Python modules −

from __future__ import print_function
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
from datetime import datetime as dt
from datetime import timedelta

Now as usual we need to provide argument for command-line handler. Here it will accept three arguments, first would be the date value to be processed, second would be the source of that date value and third would be its type −

if __name__ == '__main__':
   parser = ArgumentParser('Timestamp Log-based artifact')
   parser.add_argument("date_value", help="Raw date value to parse")
   parser.add_argument(
      "source", help = "Source format of date",choices = ParseDate.get_supported_formats())
   parser.add_argument(
      "type", help = "Data type of input value",choices = ('number', 'hex'), default = 'int')
   
   args = parser.parse_args()
   date_parser = ParseDate(args.date_value, args.source, args.type)
   date_parser.run()
   print(date_parser.timestamp)

Now, we need to define a class which will accept the arguments for date value, date source, and the value type −

class ParseDate(object):
   def __init__(self, date_value, source, data_type):
      self.date_value = date_value
      self.source = source
      self.data_type = data_type
      self.timestamp = None

Now we will define a method that will act like a controller just like the main() method −

def run(self):
   if self.source == 'unix-epoch':
      self.parse_unix_epoch()
   elif self.source == 'unix-epoch-ms':
      self.parse_unix_epoch(True)
   elif self.source == 'windows-filetime':
      self.parse_windows_filetime()
@classmethod
def get_supported_formats(cls):
   return ['unix-epoch', 'unix-epoch-ms', 'windows-filetime']

Now, we need to define two methods which will process Unix epoch time and FILETIME respectively −

def parse_unix_epoch(self, milliseconds=False):
   if self.data_type == 'hex':
      conv_value = int(self.date_value)
      if milliseconds:
         conv_value = conv_value / 1000.0
   elif self.data_type == 'number':
      conv_value = float(self.date_value)
      if milliseconds:
         conv_value = conv_value / 1000.0
   else:
      print("Unsupported data type '{}' provided".format(self.data_type))
      sys.exit('1')
   ts = dt.fromtimestamp(conv_value)
   self.timestamp = ts.strftime('%Y-%m-%d %H:%M:%S.%f')
def parse_windows_filetime(self):
   if self.data_type == 'hex':
      microseconds = int(self.date_value, 16) / 10.0
   elif self.data_type == 'number':
      microseconds = float(self.date_value) / 10
   else:
      print("Unsupported data type '{}'   provided".format(self.data_type))
      sys.exit('1')
   ts = dt(1601, 1, 1) + timedelta(microseconds=microseconds)
   self.timestamp = ts.strftime('%Y-%m-%d %H:%M:%S.%f')

After running the above script, by providing a timestamp we can get the converted value in easy-to-read format.

Web Server Logs

From the point of view of digital forensic expert, web server logs are another important artifact because they can get useful user statistics along with information about the user and geographical locations. Following is the Python script that will create a spreadsheet, after processing the web server logs, for easy analysis of the information.

First of all we need to import the following Python modules −

from __future__ import print_function
from argparse import ArgumentParser, FileType

import re
import shlex
import logging
import sys
import csv

logger = logging.getLogger(__file__)

Now, we need to define the patterns that will be parsed from the logs −

iis_log_format = [
   ("date", re.compile(r"\d{4}-\d{2}-\d{2}")),
   ("time", re.compile(r"\d\d:\d\d:\d\d")),
   ("s-ip", re.compile(
      r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}")),
   ("cs-method", re.compile(
      r"(GET)|(POST)|(PUT)|(DELETE)|(OPTIONS)|(HEAD)|(CONNECT)")),
   ("cs-uri-stem", re.compile(r"([A-Za-z0-1/\.-]*)")),
   ("cs-uri-query", re.compile(r"([A-Za-z0-1/\.-]*)")),
   ("s-port", re.compile(r"\d*")),
   ("cs-username", re.compile(r"([A-Za-z0-1/\.-]*)")),
   ("c-ip", re.compile(
      r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}")),
   ("cs(User-Agent)", re.compile(r".*")),
   ("sc-status", re.compile(r"\d*")),
   ("sc-substatus", re.compile(r"\d*")),
   ("sc-win32-status", re.compile(r"\d*")),
   ("time-taken", re.compile(r"\d*"))]

Now, provide an argument for command-line handler. Here it will accept two arguments, first would be the IIS log to be processed, second would be the desired CSV file path.

if __name__ == '__main__':
   parser = ArgumentParser('Parsing Server Based Logs')
   parser.add_argument('iis_log', help = "Path to IIS Log",type = FileType('r'))
   parser.add_argument('csv_report', help = "Path to CSV report")
   parser.add_argument('-l', help = "Path to processing log",default=__name__ + '.log')
   args = parser.parse_args()
   logger.setLevel(logging.DEBUG)
   msg_fmt = logging.Formatter(
      "%(asctime)-15s %(funcName)-10s ""%(levelname)-8s %(message)s")
   
   strhndl = logging.StreamHandler(sys.stdout)
   strhndl.setFormatter(fmt = msg_fmt)
   fhndl = logging.FileHandler(args.log, mode = 'a')
   fhndl.setFormatter(fmt = msg_fmt)
   
   logger.addHandler(strhndl)
   logger.addHandler(fhndl)
   logger.info("Starting IIS Parsing ")
   logger.debug("Supplied arguments: {}".format(", ".join(sys.argv[1:])))
   logger.debug("System " + sys.platform)
   logger.debug("Version " + sys.version)
   main(args.iis_log, args.csv_report, logger)
   iologger.info("IIS Parsing Complete")

Now we need to define main() method that will handle the script for bulk log information −

def main(iis_log, report_file, logger):
   parsed_logs = []

for raw_line in iis_log:
   line = raw_line.strip()
   log_entry = {}

if line.startswith("#") or len(line) == 0:
   continue

if '\"' in line:
   line_iter = shlex.shlex(line_iter)
else:
   line_iter = line.split(" ")
   for count, split_entry in enumerate(line_iter):
      col_name, col_pattern = iis_log_format[count]

      if col_pattern.match(split_entry):
         log_entry[col_name] = split_entry
else:
   logger.error("Unknown column pattern discovered. "
      "Line preserved in full below")
      logger.error("Unparsed Line: {}".format(line))
      parsed_logs.append(log_entry)
      
      logger.info("Parsed {} lines".format(len(parsed_logs)))
      cols = [x[0] for x in iis_log_format]
      
      logger.info("Creating report file: {}".format(report_file))
      write_csv(report_file, cols, parsed_logs)
      logger.info("Report created")

Lastly, we need to define a method that will write the output to spreadsheet −

def write_csv(outfile, fieldnames, data):
   with open(outfile, 'w', newline="") as open_outfile:
      csvfile = csv.DictWriter(open_outfile, fieldnames)
      csvfile.writeheader()
      csvfile.writerows(data)

After running the above script we will get the web server based logs in a spreadsheet.

Scanning Important Files using YARA

YARA(Yet Another Recursive Algorithm) is a pattern matching utility designed for malware identification and incident response. We will use YARA for scanning the files. In the following Python script, we will use YARA.

We can install YARA with the help of following command −

pip install YARA

We can follow the steps given below for using YARA rules to scan files −

First, set up and compile YARA rules
Then, scan a single file and then iterate through the directories to process individual files.
Lastly, we will export the result to CSV.

Python Code

Let us see how to use Python code for this purpose −

First, we need to import the following Python modules −

from __future__ import print_function
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter

import os
import csv
import yara

Next, provide argument for command-line handler. Note that here it will accept two arguments – first is the path to YARA rules, second is the file to be scanned.

if __name__ == '__main__':
   parser = ArgumentParser('Scanning files by YARA')
   parser.add_argument(
      'yara_rules',help = "Path to Yara rule to scan with. May be file or folder path.")
   parser.add_argument('path_to_scan',help = "Path to file or folder to scan")
   parser.add_argument('--output',help = "Path to output a CSV report of scan results")
   args = parser.parse_args()
   main(args.yara_rules, args.path_to_scan, args.output)

Now we will define the main() function that will accept the path to the yara rules and file to be scanned −

def main(yara_rules, path_to_scan, output):
   if os.path.isdir(yara_rules):
      yrules = yara.compile(yara_rules)
   else:
      yrules = yara.compile(filepath=yara_rules)
   if os.path.isdir(path_to_scan):
      match_info = process_directory(yrules, path_to_scan)
   else:
      match_info = process_file(yrules, path_to_scan)
   columns = ['rule_name', 'hit_value', 'hit_offset', 'file_name',
   'rule_string', 'rule_tag']
   
   if output is None:
      write_stdout(columns, match_info)
   else:
      write_csv(output, columns, match_info)

Now, define a method that will iterate through the directory and passes the result to another method for further processing −

def process_directory(yrules, folder_path):
   match_info = []
   for root, _, files in os.walk(folder_path):
      for entry in files:
         file_entry = os.path.join(root, entry)
         match_info += process_file(yrules, file_entry)
   return match_info

Next, define two functions. Note that first we will use match() method to yrules object and another will report that match information to the console if the user does not specify any output file. Observe the code shown below −

def process_file(yrules, file_path):
   match = yrules.match(file_path)
   match_info = []
   
   for rule_set in match:
      for hit in rule_set.strings:
         match_info.append({
            'file_name': file_path,
            'rule_name': rule_set.rule,
            'rule_tag': ",".join(rule_set.tags),
            'hit_offset': hit[0],
            'rule_string': hit[1],
            'hit_value': hit[2]
         })
   return match_info
def write_stdout(columns, match_info):
   for entry in match_info:
      for col in columns:
         print("{}: {}".format(col, entry[col]))
   print("=" * 30)

Lastly, we will define a method that will write the output to CSV file, as shown below −

def write_csv(outfile, fieldnames, data):
   with open(outfile, 'w', newline="") as open_outfile:
      csvfile = csv.DictWriter(open_outfile, fieldnames)
      csvfile.writeheader()
      csvfile.writerows(data)

Once you run the above script successfully, we can provide appropriate arguments at the command-line and can generate a CSV report.