-
Notifications
You must be signed in to change notification settings - Fork 5
Information Diversity
Information Diversity
Main source of feature comes from the following paper: Teams vs. Crowds: A Field Test of the Relative Contribution of Incentives, Member Ability, and Emergent Collaboration to Crowd-Based Problem Solving Performance written by Reidl & Woolley 2016 (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2384068)
This conversation-level feature uses topic modeling to measure the level of information diversity across a conversation. We first preprocess the data with lowercasing, lemmatization, removing stop words, and removing short words (less than length 3). We then use the gensim package to create an LDA Model for each conversation, generating a corresponding topic space with its number of dimensions = num_topics. To determine the number of topics used, we use a logarithmic scale relative to the number of chats in the conversation.
A team's info diversity is then computed by looking at the average cosine dissimilarity between each chat's topic vector and the mean topic vector across the entire conversation. The value ranges between 0 and 1, with higher values indicating a higher level of information diversity/diversity in topics discussed throughout the conversation. As discussed in the paper above, typical info diversity values are quite small, with the paper having a mean score of 0.04 and standard deviation of 0.05.
N/A
N/A
N/A
Description of metric calculation is given on pages 16-19; relevant/expected values are given in Table 3.