Skip to content

wanzhuz/cleaning-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Cleaning and Analyzing

Overview: Clean and organize data to identify and understand trends. As computers/operating systems run, different events that occur are logged and recorded in various “log” files. These include events such as

  • attempted logins to the machine from others
  • logins from the same machine
  • running commands as another user via sudo
  • adding users

We want to monitor all login attempts to help identify and prevent potential attacks on the machine. The data we will be working with is a merged log file consisting of 5 different log files from different machines (10.9 MB). The structure of the log file message is usually METADATA followed by the message from the machine.

Task

We create a data-frame with a row for each log message line. The data-frame consists of the following columns:

  • date-time
  • name of host
  • application (app)
  • process ID (PID)
  • message
  • name of the log file

We check and analyze our data by:

  • check that PIDs are all numbers
  • verify the number of lines in each log file
  • check the range of date-times for each message vs. the range of date-times for each of the different log files
  • find how many days each log file spans
  • is there a pattern in the application names?
  • is the host value constant for all records in each log file?
  • identify the most common daemons/programs that are logging information on each of the different hosts

Now, we check for valid and invalid logins. This will help us identify which IP addresses and users to look out for.

  • collect the usernames and IP addresses of the valid logins
  • collect the usernames and IP addresses of the invalid logins

Finally, we check what the executables/programs run via sudo are. Identify their user and machine.

About

cleaning and organizing log files

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages