Page tree
Skip to end of metadata
Go to start of metadata

Here you can find the "project checklist", which includes questions and topics that should be considered when starting a new ML project or during the project lifecycle.

Business need

  • What problems are we trying to solve? Is the problem worth solving?
  • Who is the customer (business owner) of the solution who will make the final decision about the solution?
  • Who are the stakeholders and end-users for the solution?
  • Does the problem require data-driven / machine learning solution or can we solve this by traditional software methods?
  • Are the expectations realistic?
  • Do we have the necessary resources and people allocated for this? Data engineering, model development and IT operations resources?
  • Change mgmt & training needs - how will end users be impacted by the solution?
  • Once the model is in production how will we collect feedback?


  • What data is needed to train the model?
  • What data are we missing? What additional data is needed for proposed projects?
  • What is the data quality?
  • Data volume - several factors influence the data volume required for a model: Accuracy requirements, linear vs. non-linear models, number of features
  • Data variety - is the dataset representative of various business scenarios and use cases?
  • Is the data available and accessible? APIs and machine readable?
  • Where and how will the data be stored and managed during and after the project?
  • What are the possible constraints or challenges in accessing or incorporating this data?
  • What regulatory constraints exist on data collection, analysis, or implementation? Has the specific legislation and precedents been examined recently? What workarounds might exist?
  • What changes to data collection, coding, integration is needed?
  • How will we version control the data?
  • Have we automated all the necessary data processing steps?

Model training and development

  • What kind of models are suitable for the problem? Have we tried these models before?
  • What tools and libraries are used for to build the model?
  • What kind of development environment is suitable for developing the model? Local vs. cloud environments?
  • Does the model training require heavy processing? Special capacity / memory requirements?
  • How do you know whether a model is “good enough”? What metrics will be used to evaluate performance?
  • When and how often should the model be retrained? 
  • How will the model be tested, what test cases developed?
  • Have we automated all the necessary steps to be able to perform model re-training?
  • Are all model artifacts incl. script, code, data preparation, configurations in version control?

IT integrations and deployment

  • What IT systems are involved in the ML solution? 
  • How will the model be used and consumed? What APIs and integrations, real-time vs. batch?
  • What platform will host and serve the model in production environment?
  • What IT integrations are required? How will these IT integrations be implemented
  • What implementation challenges may occur?

Operations and maintenance

  • When and how are the results and models transferred from development to operations teams? Are responsibilities clear?
  • How is the performance of models tracked? 
  • When and how often should the model be retrained?
  • Who is the support organization for the ML solution?
  • When is refactoring performed on code? How is the correctness and performance of models maintained and validated during refactoring?
  • How are user support requests logged and managed?


For each project being considered enumerate potential constraints that may impact the success of the project, e.g.:

  • What regulatory constraints exist on data collection, analysis, or implementation? Has the specific legislation and precedents been examined recently? What workarounds might exist?
  • What organizational constraints exist, including culture, skills, or structure?
  • What management constraints are there?
  • Are there any past analytic projects which may impact how the organization resources would view data-driven approaches?
  • No labels