The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.

Overview

Warning - this repository is a snapshot of a repository internal to NHS Digital. This means that links to videos and some URLs may not work.

Repository owner: NHS Digital Analytical Services

Email: [email protected]

To contact us raise an issue on Github or via email and will respond promptly.

RAP community of practice

Welcome to the landing page for the RAP community of practice repo.

You can learn all about Reproducible analytical pipelines (RAP) on our what is RAP page. In a nutshell though, RAP is becoming the standard for publishing analytical outputs in government. RAP combines a number of ways of working that help to improve the reliability, transparency, and speed of statistics publications. Reproducible Analytical Pipelines follow the principles of the AQUA Book guidelines, which revolve around analysis being reproducible, auditable, transparent, and quality assured.

The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP). This repo is a central repository for resources and guidance to help teams adopting RAP practices. There is an associated [MS Teams page] where you can introduce yourself, ask for help, or discuss different approaches. Over time we hope to build up a community of people who can self-support and further develop these ways of working.

The community of practice aims to support teams in adopting RAP practices through:

  1. Offering in-person support as teams establish new working practices
  2. Producing learning materials that offer reusable templates adapted for the NHSD analytical environment

This work is prompted by the observations that teams can struggle to adopt RAP practices without direct support. While no one element of RAP is particularly difficult, learning several new skills at the same time as delivering BAU is challenging. Teams can struggle to find the defended time to embed these practices. See the Statistics Authority report on the barriers to RAP adoption for more information. Luckily, in NHSD we have strong senior support for RAP and many teams have already begun to adopt many of the practices included in RAP. Consequently, we already have a large pool of skilled, ethusiastic analysts who are willing to help others. These resources also aim to support the goals laid out in the Goldacre report Bringing NHS data analysis into the 21st century and to align with Tim Berners-Lee's Five star data principles.

Support and training

If your team is embarking upon a RAP journey, you should look at our what is RAP page and try to complete the self-assessment. From there, we recommend reaching out for some in-person support. The RAP Champion Function (within the Data Science Skilled Team) can offer support in many forms:

  • Reviewing your RAP work and assessing your progress against the levels of RAP
  • Peer review of code
  • Workshops for a specific RAP capability
  • Consultancy style engagement where we plan a migration strategy
  • Pair coding
  • Shadowing another team

If you want to talk about any of this then please reach out on the [RAP community of practice MS Teams] page (internal to NHSD).

We maintain a list of people who are willing to dedicate some time to support others. Please add your name to the mix if you are willing to support someone else. You don't need to be an expert - just willing to share what you know.

Tutorials and resources

As we work alongside teams, we try to produce reusable learning materials pitched at specifically supporting NHSD teams. We try (with partial success) to avoid reproducing guidance that is easily available online. Instead, we link to lots of external resources where you can self-serve. Our focus instead aims to create some bespoke guidance that lays out how you would accomplish these practices in the NHSD setting.

Here are some of the initial resources:

These resources are demand-driven so if you want something then please ask on the [MS Teams page]. We would also ask you to contribute if you can improve on any of the resources or can fill in any other gaps.

The resources are not intended to be prescriptive. There are many ways to accomplish a task and teams have valid reasons for choosing other approaches. Instead the intention of the resources provided here is to offer a way in for teams who want to adopt good practices that they have heard about but don't know where to start.

Misc

We have taken inspiration from the NHSD software engineering COP. It has tons of great material so I encourage you to read and reflect on these working practices.

Licence

RAP Community of Practice codebase is released under the MIT License.

The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.

Comments
  • Dead link

    Dead link

    opened by abbieprescott 4
  • dependency management

    dependency management "not possible in DAE"

    In Levels of RAP it say: Does your repo include dependency management? (i.e. requirements.txt or conda environment for RDS users. Not possible in DAE)

    It's not strictly true that this cannot be described for DAE - though it is more limited. One can describe the cluster used (runtime, libraries etc).

    opened by SamHollings 2
  • RAP Publishing Checks - Clarify what are credentials and secrets

    RAP Publishing Checks - Clarify what are credentials and secrets

    We've had some feedback that the part of the publishing checks that says "no credentials or secrets" is not clear, as analysts have not seen these terms before.

    The following text might make things easier to understand:

    Credentials or secrets are essentially passwords that computers use for encrypted communication or access to services. For example, with many APIs (like the Google Maps API) you must supply a credential code to access the service. Often times these codes look like long strange combinations of letters and numbers (l79sDgH9s...). We must not share our passwords publicly, so you should not commit credentials and secrets.

    opened by goodyguts 2
  • Environment and dependecy management - needs to be clearer

    Environment and dependecy management - needs to be clearer

    In the "levels of RAP" people become confused by environment and dependency management - we need to link to page which very clearly describe these, what the point of it is, and how they can know if they're meeting this requirement.

    opened by SamHollings 2
  • Pyspark guidance

    Pyspark guidance

    I'm not a fan of referring to it as a "flavour of python" (about PYspark page)

    I think Pyspark should be contained underneath Python.

    I also think it should make it clear that distribution of processing only occurs if its set up right - spark on a normal laptop will not be any more powerful than say pandas. On a big cluster in databricks is a different story.

    I think this page might also need a reference to other python datastructures - and how there is a right tool for the right job.

    duplicate 
    opened by SamHollings 1
  • Split out Terminal guidance from

    Split out Terminal guidance from "git" guidance.

    The terminal guidance is contained within the git guidance - but the terminal is a separate tool which can be used for many purposes - probably better to have it as its own level alongside Python, git etc, and then for these pages to be referenced by the other technologies.

    opened by SamHollings 1
  • code in the open - topics and add to data-analytics-services

    code in the open - topics and add to data-analytics-services

    On the "how to publish your code in the open page" - we should tell people they should add their publication to the page: https://github.com/NHSDigital/data-analytics-services and also that they should set appropriate topics for their publication, i.e. nhs-digital-publication

    opened by SamHollings 1
  • Signpost resources to ensure accessibility requirements are met

    Signpost resources to ensure accessibility requirements are met

    This is most relevant for any outputs produced. See guidance.

    As a starting point, the python visualisation guide should include tips on how to make visualisations more accessible:

    • The Home Office has some posters on accessible design
    • There are also countless online resources on accessibility relating to colour-blindness, visual impairments etc.

    We should also consider including a note on accessibility in the design of RAP. A pipeline would be difficult to reproduce if a user could not access any part of the pipeline. This includes README files, as well as output types.

    opened by harrietrs 1
  • Environment management external links

    Environment management external links

    We should do more to explain how environment management plays into reproducibility.

    This page is quite useful and would save us duplicating: https://realpython.com/python-virtual-environments-a-primer/

    opened by connor1q 1
  • Broken link

    Broken link

    https://github.com/NHSDigital/rap-community-of-practice/blob/main/python/project-structure-and-packaging.md#generic-package-template

    There is a broken link to the generic package template in the section above

    opened by connor1q 1
  • Contributions section

    Contributions section

    We're keen to encourage external improvements to these resources but we don't yet have a contributions section that explains how we will review and moderate.

    opened by connor1q 1
  • Code review page ideas

    Code review page ideas

    We have recently been doing some code reviewing. Here are a few things that we think might make the page more helpful.

    Code review before merge request

    Code should be reviewed with someone before submitting a merge request. The reviewer should consider whether the code needs to be refactored or redesigned.

    I'm not sure that I always agree with this. Merge requests make it really easy to leave comments on different parts of the code, and in some ways make the life of the reviewer and the merge request submitter easier. Maybe rephrase as

    You don't have to save reviewing your code until the end. You can do small reviewing and also pair programming while developing the ticket. Seeking feedback sooner could mean you save time because you do not have to change as much when the final review happens later.

    Different types of code review

    There are different types of code review that you can get. It may be worth highlighting them.

    1. Merge request code review

      A standard review process that checks whether changes to the codebase are acceptable. You focus only on the code that has changed. It should be relatively quick, and very regular (one every time you implement a new feature). Normally done by a member of the team.

    2. Full code review

      A code review where someone looks at all your code together, and gives you overall feedback. This review allows someone to look at the bigger picture, rather than one individual feature. These reviews take longer, and are less regular. Normally done by members outside your team, so that it is a fresh pair of eyes.

    3. Fitness to publish checks

      A code review to check the code is okay to publish. Note that, in the code review, you will normally limit yourself to making suggestions that you want completed before the code is published. This may mean you avoid suggesting big changes to the code, and instead focus in on checks like ensuring documentation is well written, or removing passwords from the code.

    Maybe split code review checklist into beginner and advanced items?

    One of the items on the code review checklist is

    Documentation is hosted for easy access. GitHub Pages and Read the Docs provide a free service for hosting documentation publicly.

    Even with advanced teams in data services I do not see them doing this. It might be worth prioritizing, so that the checklist is less overwhelming.

    Maybe organise the checklist items by the RAP level the team is aiming for.

    on jira workplan 
    opened by goodyguts 2
  • 03_quality-assuring-analytical-ouputs page not clearly linked with levels of RAP

    03_quality-assuring-analytical-ouputs page not clearly linked with levels of RAP

    The AQUA page (https://github.com/NHSDigital/rap-community-of-practice/blob/main/implementing_RAP/general_guidance/quality-assuring-analytical-ouputs.md) is not clearly associated with the levels of RAP and so people can find it a bit confusing when and how they should be following it.

    We need to more clearly link it into peoples workflow when planning out RAP (some of it is beyond RAP and more general guidance on managing analytical work), and perhaps reduce duplication by removing those bits already covered by the "levels of RAP" - and making these clear.

    on jira workplan 
    opened by SamHollings 1
  • Clean code guidance

    Clean code guidance

    some teams want to use clean code - we need guidance on the best way to approach this for analytical code, why you would want to do it, and what to watch out for.

    on jira workplan 
    opened by SamHollings 2
Releases(v1.1.0)
  • v1.1.0(Dec 21, 2022)

    What's Changed

    Automatic Release Notes

    • Release v1.1.0 by @xiyaozhuang in https://github.com/NHSDigital/rap-community-of-practice/pull/35

    New Contributors

    • @xiyaozhuang made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/35

    Full Changelog: https://github.com/NHSDigital/rap-community-of-practice/compare/v1.0.0...v1.1.0

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Dec 6, 2022)

    What Changed

    Automatic release notes

    • Hr 1188 r git by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/2
    • Add Intro to R link by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/3
    • Improving layout and expanding rollout section by @connor1q in https://github.com/NHSDigital/rap-community-of-practice/pull/4
    • Cq updates by @connor1q in https://github.com/NHSDigital/rap-community-of-practice/pull/5
    • Hr changes by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/9
    • Hr updates to git by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/10
    • Update publishing code in the open by @harrietrs in https://github.com/NHSDigital/rap-community-of-practice/pull/20
    • Sh new front page by @SamHollings in https://github.com/NHSDigital/rap-community-of-practice/pull/22
    • Restructure and edit files by @abbieprescott in https://github.com/NHSDigital/rap-community-of-practice/pull/23
    • Create gh-pages version by @harrietrs in https://github.com/NHSDigital/rap-community-of-practice/pull/31
    • add two new guides and pr prep by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/32
    • Publishes when to stop coding guide by @josephwilson8-nhs in https://github.com/NHSDigital/rap-community-of-practice/pull/33
    • Added new improved guides on virtual environments by @xiyaozhuang in https://github.com/NHSDigital/rap-community-of-practice/pull/34

    New Contributors

    • @helrich made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/2
    • @connor1q made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/4
    • @harrietrs made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/20
    • @SamHollings made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/22
    • @abbieprescott made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/23
    • @josephwilson8-nhs made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/33
    • @xiyaozhuang made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/34

    Full Changelog: https://github.com/NHSDigital/rap-community-of-practice/commits/v1.0.0

    Source code(tar.gz)
    Source code(zip)
Owner
NHS Digital
NHS Digital Public Repository
NHS Digital
Automation in socks label validation

This is a project for socks card label validation where the socks card is validated comparing with the correct socks card whose coordinates are stored in the database. When the test socks card is com

1 Jan 19, 2022
Compiler Final Project - Lisp Interpreter

Compiler Final Project - Lisp Interpreter

2 Jan 23, 2022
because rico hates uuid's

terrible-uuid-lambda because rico hates uuid's sub 200ms response times! Try it out here: https://api.mathisvaneetvelde.com/uuid https://api.mathisvan

Mathis Van Eetvelde 2 Feb 15, 2022
AHP Calculator - A method for organizing and evaluating complicated decisions, using Maths and Psychology

AHP Calculator - A method for organizing and evaluating complicated decisions, using Maths and Psychology

16 Aug 08, 2022
Async timeit - Async version of python's timeit

Async Timeit Replica of default python timeit module with small changes to allow

Raghava G Dhanya 3 Apr 13, 2022
A self contained invitation management system for gatekeeping.

Invitease Description A self contained invitation management system for gatekeeping. Purpose Serves as a focal point for inviting guests to a venue pr

מעגן מיכאל 7 Jul 19, 2022
pvaPy provides Python bindings for EPICS pvAccess

PvaPy - PvAccess for Python The PvaPy package is a Python API for EPICS7. It supports both PVA and CA providers, all standard EPICS7 types (structures

EPICS Base 25 Dec 05, 2022
A gamey, snakey esoteric programming language

Snak Snak is an esolang based on the classic snake game. Installation You will need python3. To use the visualizer, you will need the curses module. T

David Rutter 3 Oct 10, 2022
Dicionario-git-github - Dictionary created to help train new users of Git and GitHub applications

Dicionário 📕 Dicionário criado com o objetivo de auxiliar no treinamento de nov

Felippe Rafael 1 Feb 07, 2022
How to create the game Rock, Paper, Scissors in Python

Rock, Paper, Scissors! If you want to learn how to do interactive games using Python, then this is great start for you. In this code, You will learn h

SplendidSpidey 1 Dec 18, 2021
A simple and easy to use Python's PIP configuration manager, similar to the Arch Linux's Java manager.

PIPCONF - The PIP configuration manager If you need to manage multiple configurations containing indexes and trusted hosts for PIP, this project was m

João Paulo Carvalho 11 Nov 30, 2022
Module for working with the site dnevnik.ru with python

dnevnikru Module for working with the site dnevnik.ru with python Dnevnik object accepts login and password from the dnevnik.ru account Methods: homew

Aleksandr 21 Nov 21, 2022
This repository holds those infrastructure-level modules, that every application requires that follows the core 12-factor principles.

py-12f-common About This repository holds those infrastructure-level modules, that every application requires that follows the core 12-factor principl

Tamás Benke 1 Dec 15, 2022
MeerKAT radio telescope simulation package. Built to simulate multibeam antenna data.

MeerKATgen MeerKAT radio telescope simulation package. Designed with performance in mind and utilizes Just in time compile (JIT) and XLA backed vectro

Peter Ma 6 Jan 23, 2022
Acesse seus investimentos da NuInvest pelo Python (Experimental)

Acesse seus investimentos da NuInvest pelo Python (Experimental)

André Roggeri Campos 5 Dec 06, 2022
This is a far more in-depth and advanced version of "Write user interface to a file API Sample"

Fusion360-Write-UserInterface This is a far more in-depth and advanced version of "Write user interface to a file API Sample" from https://help.autode

4 Mar 18, 2022
WinBoost: Boost your windows system.

Winboost runs a complete checkup of your entire system locating junk files, speed-reducing issues and causes of any system or application glitches or crashes. Through a lot of research and testing, w

Smit Parmar 4 Oct 01, 2021
scap is a tool for putting code in places and for other purposes

Scap is the deployment script used by Wikimedia Foundation to publish code and configuration on production web servers.

Wikimedia 7 Nov 02, 2022
Cool Bioinformatics Scripts

Cool Bioinformatics Scripts qqplot You can use this script in two ways read tons of millions of P values from stdin # python zcat pval.txt.gz | qqplo

8 Oct 30, 2022
A data driven app for bicycle hiring in London(UK)

bicycle_hiring_app_deployed A data driven app for bicycle hiring in London(UK). It predicts expected number of bicycle hire in London. It asks users t

Rajarshi Roy Raju 1 Dec 10, 2021