The Department of Informatics and Analytics at Dana-Farber Cancer Institute seeks a motivated and talented Artificial Intelligence and Data Engineering Intern for our expanding AI & data team. Successful candidates will have the opportunity to contribute on high-impact healthcare projects existing at the cutting edge of cancer research and clinical care. Project Objective AI Studio serves as Dana Farber’s centralized platform for AI model development and deployment, primarily utilizing Databricks. As the platform's user base grows, the AI Studio team requires effective methods to 1) track expenses and 2) monitor account usage.Most activities are conducted on Databricks, which automatically logs data. By leveraging Databricks' dashboarding capabilities, we can create visualizations to monitor cost and usage effectively.Requirements:1. Team usage/cost a. AI Studio users are generally organized into teams, for example the admins of AI Studio are the “AI in Production” team. We aim to track which teams are utilizing the most compute resources and incurring the highest costs.2. Project usage/cost a. Users can also be grouped by projects, which may include members from the same or different teams. Dashboards should reflect usage and costs at the project level.3. Dashboard automation a. The dashboards constructed for teams/projects should be versatile to apply to different varieties of teams and projects, while still effectively tracking costs and usage. b. The Intern will develop a script to generate new dashboards for projects or teams, these parameters should include project/team name, users and relevant tags. c. While Databricks logging tracks most key parameters, some things like projects need to be more granular. To correct for this, tags can be used to specify additional information.4. Stretch goal: Model forecasting a. In addition to tracking cost, we want to estimate costs for the future. If time permits, the intern may develop and implement a cost forecasting model.Approach:1. Prerequisites: a. Ensure all workspace items are appropriately tagged by users. b. Onboard the intern to Databricks. c. Provide the intern with access to the systems access table in Unity Catalog. d. The intern should have proficient knowledge of SQL and Python.2. Data Handling a. Interns should query logging data from the systems access table and the user data from tracking tables b. Intern will use SQL or Python to construct data tables 3. Development a. Intern will define minimum necessary parameters to automate the process b. Intern will use the defined parameters to create two scripts, one to automate team assessments and one to automate project assessments 4. Stretch a. Once these are automated, intern may choose to explore potential forecasting models b. If a reasonable one is identified, intern can employ Databricks notebooks to test and train it The Basics This role is intended to be part-time (