For program documentation, see ../README.md
The logdissect module contains utilities for parsing, merging, filtering, and exporting log data.
The logdissect module comes with the logdissect log analysis program. It contains objects which can be used to parse log lines and files, merge and filter logs, and output to a few formats.
import logdissect
# Parsing
myparser = logdissect.parsers.<parser>.ParseModule()
attribute_dict = myparser.parse_line('<RAW_LINE>')
file_dict = myparser.parse_file('<PATH/TO/FILE>')
# Filtering
myfilter = logdissect.filters.<filter>.FilterModule()
filterd_dict = myfilter.filter_data(file_dict, values=['<VALUE1>', '<VALUE2>'])
# Output
myoutput = logdissect.output.<output>.OutputModule()
myoutput.write_output(data, filename='FILENAME')
# Time Zones
entry = logdissect.utils.convert_standard_datestamp(entry)
entry = logdissect.utils.convert_nodate_datestamp(entry, datetimeobject)
entry = logdissect.utils.convert_iso_datestamp(entry)
entry = logdissect.utils.convert_unix_datestamp(entry)
entry = logdissect.utils.convert_now_datestamp(entry)
entry = logdissect.utils.get_utc_date(entry)
# Merging
log_dict = logdissect.utils.merge_logs(dataset, sort={True|False})
Replace <parser> with one of the available parsers:
ciscoios
- Cisco IOS logsemerge
- gentoo emerge logslinejson
- logdissect object-per-line JSON outputsojson
- logdissect single-object JSON outputsyslog
- standard syslogsyslogiso
- syslog with ISO 8601 datestampsyslognohost
- syslog with no host attributetcpdump
- tcpdump terminal outputwebaccess
- web access logswindowsrsyslog
- windows rsyslog agent forwarded logs
Parsers have two methods (except the sojson parser, which has no parse_line() method):
Accepts a filename as input, and returns a dictionary with some metadata, and a list of entry dictionaries (entries
).
Parsers have a tzone
attribute that uses standard ISO 8601 offset to UTC (e.g. +0500
, -0200
); if not set, logdissect will attempt to get current time zone data from the local system (unless a time zone is already present, such as in the syslogiso parser, or the sojson parser).
Parsers for timestamp formats with no year use file modification times to assign years to date stamps. This allows them to parse files that span more than one year without a problem. If you are copying a log file, always preserve original mtimes using cp -p
(or cp --preserve=timestamps
) and scp -p
.
Accepts a log line as input, and returns a dictionary of strings. There are two built-in keys, raw_text
and parser
, and parsers can add their own keys.
Parsers have a datestamp_type
attribute that defines how timestamps will be converted. The options are as follows:
standard
- standard syslog date stampsnodate
- time stamps with no date (i.e. tcpdump)iso
- ISO 8601 timestampswebaccess
- web access log date stampsunix
- Unix timestampsnow
- always set date stamp to time parsedNone
- skip conversion
Conversion happens with any parser that has a date_stamp
field in fields
(the now
datestamp type doesn't require a date_stamp
field), and adds the following attributes to the entry dictionary:
year
- a 4-digit string (or None)month
- a 2-digit stringday
- a 2-digit stringtstamp
- a 6-digit string with optional decimal and extra placestzone
-+
or-
followed by a 4-digit offset to utc (HHMM)numeric_date_stamp
- a datestamp in the form of YYYYmmddHHMMSS[.ffffff]date_stamp
- a standard date stamp (added fornow
datestamp type only)
The sojson parser has no parse_line() method.
There is a blank parser that can be used to create custom parsers on the fly.
This example will create a parser to capture a unix timestamp with a colon followed by a message:
myparser = logdissect.parsers.blank.ParseModule()
myparser.name = 'my parser'
myparser.format_regex = '^(\d+\.?\d*):\s(.*)$'
myparser.fields = ['date_stamp', 'message']
myparser.datestamp_type = 'unix'
myparser
could then be used like any other parse module. You can also define a post_parse_action
method if you need to customize entries after they have been parsed. It should accept and return an entry dictionary. The inherited post_parse_action
method returns the entry without changing it.
Replace <filter> with one of the available filters:
source
- match a log sourcersource
- filter out a log sourcerange
- match a time rangelast
- match a preceeding time rangegrep
- match entries containing a regular expressionrgrep
- filter out entries containing a regular expressionshost
- match a source hostrshost
- filter out a source hostdhost
- match a destination hostrdhost
- filter out a destination hostprocess
- match a source processrprocess
- filter out a source processprotocol
- match a protocolrprotocol
- filter out a protocol
Filters have one method, filter_data
. Usage for all filters except last
and range
:
data
should be a log dictionary, with an entries
value that contains a list of event dictionaries. values
is a list containing strings to match or filter out.
Syntax for the last
and range
filters differs slighty. Instead of values
, they are passed value
, which is a single string. The format of value
:
range
filter -YYYYmmddHHMMSS-YYYYmmddHHMMSS
(time values can be shortened; filter will fill in0
s)last
filter - a number, followed by eitherm
for minutes,h
for hours, ord
for days (e.g.20m
)
Time-based filters filter on the numeric_date_stamp
value. The range
filter also has a utc
keyword argument that defaults to False
. If set to True
, it will filter based on numeric_date_stamp_utc
.
Replace <output> with one of the available filters:
log
- outputs to standard log file formatsojson
- outputs entry list to a single json objectlinejson
- outputs one json entry dictionary object per line
Output modules have one method, write_output
. Usage:
data
should be a log dictionary, with an entries
value that contains a list of event dictionaries.
The log
output module also has a label
keyword argument with a few possible settings. If set to 'fname'
, it will add source file names to the output. If set to 'fpath'
, it will add full source file paths to the output.
The sojson
output module has a pretty
keyword argument. If set to true, the output will be formatted in a nice, human-readable style. The default is False
.
import logdissect.utils
entry = logdissect.utils.convert_standard_datestamp(entry)
entry = logdissect.utils.convert_nodate_datestamp(entry, datetimeobject)
entry = logdissect.utils.convert_iso_datestamp(entry)
entry = logdissect.utils.convert_unix_datestamp(entry)
entry = logdissect.utils.convert_now_datestamp(entry)
The nodate
converter uses a datetime object to assign date values. Date stamp converters assign the following fields, based on an entry dictionary's date_stamp
value:
year
- a 4 digit string (set toNone
for standard converter)month
- a 2 digit stringday
- a 2 digit stringtstamp
- a 6 digit string, with optional decimal point and fractional seconds.numeric_date_stamp
a string with formatYYYYmmddHHMMSS[.ffffff]
(not set for standard converter)
logdissect.utils
contains the following datestamp converters:
standard
- standard syslog datestampsnodate
- timestamps with no dateiso
- ISO 8601 timestampswebaccess
- web access log date stampsunix
- Unix timestampsnow
- use the current time
Sets the numeric_date_stamp_utc
value based on the numeric_date_stamp
value and the tzone
value, and returns the entry.
Returns the local time zone.
Merges multiple log dictionaries together, and returns a single log dictionary. dataset
is a dictionary with some metadata, and a data_set
value, which is a list of log dictionaries. Each log dictionary contains some metadata, and an entries
value, which is a list of event dictionaries.
If sort
is set to True
, entries will be sorted by their numeric_date_stamp_utc
value. Default is False
.
MIT License
Copyright (c) 2017 Dan Persons ([email protected])
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.