Skip to content

Information Diversity

Xinlan Emily Hu edited this page May 26, 2024 · 4 revisions

1. Feature Name

Information Diversity

2. Literature Source (Serial Number, link)

Main source of feature comes from the following paper: Teams vs. Crowds: A Field Test of the Relative Contribution of Incentives, Member Ability, and Emergent Collaboration to Crowd-Based Problem Solving Performance written by Reidl & Woolley 2016 (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2384068)

3. Description of how the feature is computed (In Layman’s terms)

This conversation-level feature uses topic modeling to measure the level of information diversity across a conversation. We first preprocess the data with lowercasing, lemmatization, removing stop words, and removing short words (less than length 3). We then use the gensim package to create an LDA Model for each conversation, generating a corresponding topic space with its number of dimensions = num_topics. To determine the number of topics used, we use a logarithmic scale relative to the number of chats in the conversation.

A team's info diversity is then computed by looking at the average cosine dissimilarity between each chat's topic vector and the mean topic vector across the entire conversation. The value ranges between 0 and 1, with higher values indicating a higher level of information diversity/diversity in topics discussed throughout the conversation. As discussed in the paper above, typical info diversity values are quite small, with the paper having a mean score of 0.04 and standard deviation of 0.05.

4. Algorithms used (KNN, Logistic Regression etc.)

N/A

5. ML Inputs/Features

N/A

6. Statistical concepts used

N/A

7. Pages of the literature to be referred to for details

Description of metric calculation is given on pages 16-19; relevant/expected values are given in Table 3.

8. Any tweaks/changes/adaptions made from the original source