LogPrecis: Unleashing Language Models for Automated Malicious Log Analysis

Precis: a concise summary of essential points, statements, or facts

Fingerprint graph similarities of UNIX attacks

Info

This webpage contains additional material on the paper:

LogPrecis: Unleashing Language Models for Automated Malicious Log Analysis

Abstract

Security logs are the key to understanding attacks and diagnosing vulnerabilities. Often coming in the form of text logs, their analysis remains a daunting challenge. Language Models (LMs) have demonstrated unmatched potential in understanding natural and programming languages. The question arises as to whether and how LMs could be also used to automatize the analysis of security logs. We here systematically study how to benefit from the state-of-the-art LM to support the analysis of text-like Unix shell attack logs automatically. For this, we thoroughly designed LogPr ́ecis. LogPr ́ecis receives as input malicious shell sessions. It then automatically identifies and assigns the attacker tactic to each portion of the session, i.e., unveiling the sequence of the attacker’s goals. This creates a unique attack fingerprint. We demonstrate LogPr ́ecis capability to support the analysis of two large datasets containing about 400,000 unique Unix shell attacks recorded in a 2-year-long honeypot deployment. LogPr ́ecis reduces the analysis to about 3,000 unique fingerprints. Such abstraction lets us better understand attacks, extract attack prototypes, detect novelties, and track families and mutations. Overall, LogPr ́ecis, released as open source, demonstrates the potential of adopting LMs for security analysis and paves the way for better and more responsive defense against cyberattacks

The data

You can find the data on this link:

Cowrie_Data

Each line represents an interaction between Cowrie and the attacker. The column “session_id” is a session identifier. The column “statements” contains the lists of statements associated with the interaction under analysis. The column “timestamps” contains the datetimes in which the interaction occurred. To obtain the entire bash session (from login to logou), group by session_id.

The code

You can find the paper’s code at this link:

LogPrecis code

Demo