Data Sources: A Deeper Dive

What We (and You) Get From Your Data

What Epiphany Uses Data for

Essentially, all customer-provided data ingested by the Epiphany Intelligence Platform serves one or more of these purposes:

Provide or enrich identity. Identity is used by Epiphany to correlate which person’s identities can do what, and what level of privilege a person has. Identities can span multiple systems and access mechanisms.
Identify and adjust friction coefficients. The data sources, where applicable, are used to determine weighting and costs for attack paths. Two paths with the same number of traversals are weighted in part by the amount of resistance or friction an attacker will encounter to cross a path or establish persistence at a foothold. Functioning anti-malware tools, network security appliances, and other components that detect, prevent, or provide tighter port, protocol, or access controls serve as resistance points.
Determine exploitability and prioritization. Due to the sheer volume of vulnerabilities and the velocity in which they are discovered, it’s not practical to expect them to be patched or mitigated within an SLA smaller than the time in which they can be exploited. Since other platforms don’t have the context to see critical assets, how they can be attacked, or resistance points, the ability to prioritize the remediation of vulnerabilities with efficiency was not previously possible. It also means that vulnerabilities that are not exploitable in a way detrimental to a specific environment would not have been easily known. With that said, data on vulnerabilities in the environment are combined with the aforementioned dataset to determine the priorities of remediation efforts.

How Epiphany Classifies Data Sources

Data sources often have overlapping data, affording Epiphany the ability to join these sources, perform analysis and correlation, and return an objective truth. At the time of this writing, Epiphany has four categorizations of data. Depending on the specific source of data there may be more elements collected in one than in others. At a high level the fundamentals for each data type are described next.

Identity

As alluded to earlier, identity data provides the baseline for permission barriers. Typically, people (subjects) have user accounts that are members of groups or are assigned roles that correlate with permissions to certain assets (objects). Data sources that fall into the identity category provide:

People

A uniquely identifiable attribute representative of a person.
A detail of the specific object-level permissions that a user’s account has.
A detail of the groups or roles to which a user belongs.
A historical account of the actual use of permissions by a specific user (for example, a session).

Groups and Roles

A detail of the users that have membership to a group or role.
A detail of object level permissions that a group or role has.
Associations between groups or roles (for example, group one is part of group two) and the effective permission and object set.

There is also object-level data that is provided by these types of data sources, such as basic inventory and object identifiers.

While not to be seen as a comprehensive list, sources such as Microsoft Active Directory, Microsoft Azure Active Directory, and OKTA provide this type of data.

Devices

Device data provides information about the state and resilience of devices. An attacker’s initial objective is to establish a foothold in a targeted environment. This foothold represents a persistence point from where an attacker can perform reconnaissance and pivot to other devices. As an attacker tries to traverse from the foothold to another device, how possible is it for this next device to become a foothold itself? Is the device resilient enough to thwart such an attempt?

Having exploitable vulnerabilities or accounts with access that should not be permitted are examples of how footholds can be established. Antivirus, endpoint detection and response (EDR) and other tools with the ability to react to questionable conduct increase resilience. Data from these tools and their configurations are evaluated by Epiphany’s machine learning model to determine effectiveness against risks. Data sources that fall in this category provide:

OS and application inventory.
Identity information (used for correlation).
Vulnerability data (if present).
Presence of a countermeasure.

Epiphany supports several common anti-virus and endpoint detection and response vendors, generally using read-only API level access to collect this data.

Vulnerability

While the platform leverages many other data sources outside of the customer environment for analysis, vulnerability data sources provide very specific types of information around the state of a device from an exploitability perspective. Vulnerability scanners and agents typically gather lots of very useful information beyond just the vulnerabilities themselves, such as users, applications installed, indicators of compromise (IOCs), and more. Epiphany uses all this information in addition to the presence of vulnerabilities known to be exploitable to gauge how easily a device can be compromised by an attacker. All these data points start to correlate to help Epiphany understand in-depth the risk a device poses to an environment, especially if these vulnerabilities exist on devices that have paths to critical assets or are used by identities with roles capable of accessing critical assets.

It is important to note that Epiphany does not simply take vulnerability details and note that a device is vulnerable and therefore a risk. An analysis and exploitability scoring mechanism is employed, which is part of the product’s distinguishing features (and intellectual property) to allow Epiphany to determine if the vulnerability on particular device could be exploited given its location on the network.

Products from manufacturers such as Tenable, Rapid7, Qualys, and Microsoft provide this sort of data and are supported by Epiphany for collection.

Network

The network is the highway that interconnects devices, therefore there is a logical correlation point between all the devices on the network. Note that the Epiphany data set may contain network switches, routers, firewalls, and IOT devices, all of which communicate across this common backbone.

Consider this: a device that is recognized as highly exploitable introduces a risk to the environment. If that device can communicate with a critical asset, does that alone suggest that there is a real risk to the target? The answer is, “it depends.” There are follow-up questions that Epiphany evaluates with its understanding of the network layer, including if ports are open or if an access control list (ACL) would block the traffic. Epiphany gathers specific data points from networks devices such as:

Network interconnections and routes
VLAN configurations
ACLs

PreviousPrimer: How Epiphany Works NextGetting Results: Data Source Outputs

Last updated 1 year ago