edX is a massive open online course (MOOC) provider founded by Harvard and MIT. It hosts a wide range of online university-level courses in different disciplines. This project is an analysis of the edx dataset with the aim of providing insights and answers to specific research questions. This readme file provides a brief overview of the project and its components, as well as instructions on how to run the code and reproduce the results.
The dataset used in this project is "Edx Courses"and contains information about 976 courses that are currently available on the edx.org platform. The main objectives of this project are to:
- Analyze the data to gain insights and identify trends/patterns
- Answer specific research questions based on the data
- Communicate the findings through visualizations and reports
To run the code for this project, follow these steps:
- Clone the repository to your local machine.
- Install the necessary dependencies (e.g., Python packages) as specified in the requirements.txt file.
- Run the Jupyter notebook(s) in the repository to reproduce the analysis and generate the visualizations. Note that some of the data files used in this project may be large and take a while to load, so be patient when running the code.
The project components include data cleaning and preprocessing, data exploration and visualization, data analysis and modeling, and reporting and visualization. To reproduce the analysis in Jupyter notebooks, follow the steps outlined in the Running the Code section.