Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing PR Diff Checks for Invisible Characters and Homoglyph Security Threats #2121

Closed
teodor-yanev opened this issue Jan 15, 2024 · 0 comments · Fixed by #2312
Closed
Assignees

Comments

@teodor-yanev
Copy link
Contributor

Description

Rule Type 1: Invisible Characters

A proposition to introduce a rule type in Minder that examines the diffs in pull requests for the presence of invisible characters. An example of a potentially dangerous insertion might be the inclusion of a Zero Width Space (\u200B) within variable names or string literals. This could lead to scenarios where let accountBalance = 1000; is visually indistinguishable from let account\u200BBalance = 1000;, the latter having a Zero Width Space between "account" and "Balance". Such modifications can create logic bombs or vulnerabilities that are hard to detect visually and could be exploited maliciously.

Invisible characters are a substantial threat because they can be used to alter the control flow of the code, introduce bugs that are difficult to trace or bypass security measures that rely on text comparison. The map of invisible characters will be derived from the database provided at Invisible Characters, excluding '\u0009' (Character Tabulation or Tab) and '\u0020' (Space) as these common ASCII characters are not generally considered malicious.

Rule Type 2: Homoglyphs / Mixed-Scripting

The second rule type aims to detect and flag the use of homoglyphs and mixed-script content within pull request diffs. For instance, consider a Python variable reassignment from password to passwоrd, where the second 'o' is actually a Cyrillic 'о'. This can cause a variable to be reassigned unexpectedly, leading to security lapses or data corruption.

Such homoglyphs are potentially dangerous as they can be used in phishing attacks, similar to the "internationalised domain name (IDN) homograph attack", where users are tricked into clicking on malicious links that appear legitimate due to the use of homoglyphs. More details on this can be found on the IDN Homograph Attack Wikipedia Page. Script spoofing, related to Homoglyphs, is a subtle yet powerful way to introduce code that appears harmless but is malicious in nature, bypassing visual inspections and potentially automated checks that don't account for such nuances.

The necessity and methodology for combating these threats are well-documented by the Unicode Consortium. Their report on Mixed Script Detection emphasises the importance of identifying text with mixed scripts as a security measure to prevent deceptive use of similar-looking characters from different scripts. We'll utilise the latest official Unicode Scripts database (Scripts.txt) to create an in-memory map for our mixed-script detection mechanism, enabling efficient and accurate identification of such issues. Having optimised this by removing redundant information like comments and Script-type descriptions, this fully populated in-memory map reached ~5.4MB during testing, which is a small price to pay for an O(1) access per rune and a linear time for traversing the whole PR diff in terms of performance.

Action Items

To implement these security checks, the following actions are proposed:

  1. Refactor the Ingestor in Minder: Our current Ingestor needs to be reworked to allow for comprehensive processing of pull request diffs (proto types, etc).

  2. Development of Rule Types:

  • Malicious Invisible Characters: Create a rule type that identifies and flags the use of non-ASCII invisible characters within code, excluding common whitespace characters like tabs and spaces.
  • Homoglyphs / Mixed-Scripts: Develop a rule type that detects the use of homoglyphs or mixed-script characters which may be intended to mislead reviewers or automate systems.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant