Implementing PR Diff Checks for Invisible Characters and Homoglyph Security Threats #2121

teodor-yanev · 2024-01-15T00:49:36Z

Description

Rule Type 1: Invisible Characters

A proposition to introduce a rule type in Minder that examines the diffs in pull requests for the presence of invisible characters. An example of a potentially dangerous insertion might be the inclusion of a Zero Width Space (\u200B) within variable names or string literals. This could lead to scenarios where let accountBalance = 1000; is visually indistinguishable from let account\u200BBalance = 1000;, the latter having a Zero Width Space between "account" and "Balance". Such modifications can create logic bombs or vulnerabilities that are hard to detect visually and could be exploited maliciously.

Invisible characters are a substantial threat because they can be used to alter the control flow of the code, introduce bugs that are difficult to trace or bypass security measures that rely on text comparison. The map of invisible characters will be derived from the database provided at Invisible Characters, excluding '\u0009' (Character Tabulation or Tab) and '\u0020' (Space) as these common ASCII characters are not generally considered malicious.

Rule Type 2: Homoglyphs / Mixed-Scripting

The second rule type aims to detect and flag the use of homoglyphs and mixed-script content within pull request diffs. For instance, consider a Python variable reassignment from password to passwоrd, where the second 'o' is actually a Cyrillic 'о'. This can cause a variable to be reassigned unexpectedly, leading to security lapses or data corruption.

Such homoglyphs are potentially dangerous as they can be used in phishing attacks, similar to the "internationalised domain name (IDN) homograph attack", where users are tricked into clicking on malicious links that appear legitimate due to the use of homoglyphs. More details on this can be found on the IDN Homograph Attack Wikipedia Page. Script spoofing, related to Homoglyphs, is a subtle yet powerful way to introduce code that appears harmless but is malicious in nature, bypassing visual inspections and potentially automated checks that don't account for such nuances.

The necessity and methodology for combating these threats are well-documented by the Unicode Consortium. Their report on Mixed Script Detection emphasises the importance of identifying text with mixed scripts as a security measure to prevent deceptive use of similar-looking characters from different scripts. We'll utilise the latest official Unicode Scripts database (Scripts.txt) to create an in-memory map for our mixed-script detection mechanism, enabling efficient and accurate identification of such issues. Having optimised this by removing redundant information like comments and Script-type descriptions, this fully populated in-memory map reached ~5.4MB during testing, which is a small price to pay for an O(1) access per rune and a linear time for traversing the whole PR diff in terms of performance.

Action Items

To implement these security checks, the following actions are proposed:

Refactor the Ingestor in Minder: Our current Ingestor needs to be reworked to allow for comprehensive processing of pull request diffs (proto types, etc).
Development of Rule Types:

Malicious Invisible Characters: Create a rule type that identifies and flags the use of non-ASCII invisible characters within code, excluding common whitespace characters like tabs and spaces.
Homoglyphs / Mixed-Scripts: Develop a rule type that detects the use of homoglyphs or mixed-script characters which may be intended to mislead reviewers or automate systems.

The text was updated successfully, but these errors were encountered:

teodor-yanev self-assigned this Jan 15, 2024

teodor-yanev mentioned this issue Feb 8, 2024

Add Homoglyphs detection in Minder #2312

Merged

teodor-yanev closed this as completed in #2312 Feb 11, 2024

This was referenced Feb 23, 2024

Add homoglyphs rule types mindersec/minder-rules-and-profiles#51

Merged

Homoglyphs Blog Post #2430

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing PR Diff Checks for Invisible Characters and Homoglyph Security Threats #2121

Implementing PR Diff Checks for Invisible Characters and Homoglyph Security Threats #2121

teodor-yanev commented Jan 15, 2024

Implementing PR Diff Checks for Invisible Characters and Homoglyph Security Threats #2121

Implementing PR Diff Checks for Invisible Characters and Homoglyph Security Threats #2121

Comments

teodor-yanev commented Jan 15, 2024

Description

Rule Type 1: Invisible Characters

Rule Type 2: Homoglyphs / Mixed-Scripting

Action Items