Description

The proposed open-source dxpr package is a software tool aimed at expediting an integrated analysis of electronic health records (EHRs). By implementing dxpr package, it is easier to integrate, analyze, and visualize clinical data.

In this part, the instruction of how dxpr package workes with procedure records is provided.

Development version

install.packages("remotes")
# Install development version from GitHub
remotes::install_github("DHLab-TSENG/dxpr")
library(dxpr)

Data Format

dxpr (procedure part) is used to pre-process procedure codes of EHRs. To execute functions in dxpr, the EHR data input should be a data frame object in R, and contain at least three columns: patient ID, ICD procedure codes and date.

Column names or column order of these three columns does not need to necessarily follow a rule. Each column name will be pass as argument. Detailed information of required data type of every column and argument of functions can be found in the reference section.

Also, in the R ecosystem, DBI, odbc, and other packages provide access to databases within R. As long as the data is retrieved from databases to a data frame in R, the processes are the same as the following example.

con <- DBI::dbConnect(odbc::odbc(),
                      Driver   = "[your driver's name]",
                      Server   = "[your server's path]",
                      Database = "[your database's name]",
                      UID      = "[Database user]",
                      PWD      = "[Database password]")
dxDataFile <- dbSendQuery(con, "SELECT * FROM example_table WHERE ID = 'sampleID'")
dbFetch(dxDataFile)

Sample file

An example of the data shows as below:
This data is a simulated medical data set of 3 patients with 170 records.

head(samplePrFile)
#>    ID   ICD       Date
#> 1:  B  5681 2008-01-14
#> 2:  A  9774 2009-01-11
#> 3:  B 44.99 2009-05-10
#> 4:  C 07.59 2009-01-21
#> 5:  B  0205 2008-07-06
#> 6:  B  8812 2007-06-27

ICD-PCS code with two format

dxpr package uses ICD-PCS codes as procedure standard. There are two formats of ICD-9 procedure codes, decimal (with a decimal point separating the code) and short format. And ICD-10 is only with short format.

ICD-9-PCS

# ICD-9-PCS_Short
head(ICD9PrwithTwoFormat$Short)
#> [1] "0001" "0002" "0003" "0009" "0010" "0011"

# ICD-9-PCS_Decimal
head(ICD9PrwithTwoFormat$Decimal)
#> [1] "00.01" "00.02" "00.03" "00.09" "00.10" "00.11"

ICD-10-PCS: only short format

# ICD-10-PCS_Short
head(prICD10$ICD)
#> [1] "0016071" "0016072" "0016073" "0016074" "0016075" "0016076"

I. Code standardization

A. ICD-9-PCS code format transformation:Short <-> Decimal

dxpr package helps users to standardize the ICD-9 procedure codes into a uniform format before further code grouping. The formats used for different grouping methods are shown as Table 1.

Table 1 Format of code classification methods

ICD format
Clinical Classifications Software (CCS) short format
Procedure class short format

Since formats of ICD-9 codes used within a dataset could be different, users can standardize the codes through this function.

The function only standardizes ICD-9 codes. There are two ways to distinguish the version of ICD code (ICD-9/ICD-10) used in data: one is a specific extra column that records version used (data type in this column should be numeric 9 or 10), the other is a specific date that is the starting date of using ICD-10 in the dataset. For example, reimbursement claims with a date required to use ICD-10 codes in the United States and Taiwan are October 1st, 2015 and January 1st, 2016, respectively.

A-1. Uniform decimal format
decimal <- icdPrShortToDecimal(prDataFile = samplePrFile,
                               icdColName = ICD, 
                               dateColName = Date,
                               icd10usingDate = "2015/10/01")
head(decimal$ICD)
#>      ICD
#> 1: 56.81
#> 2: 97.74
#> 3: 44.99
#> 4: 07.59
#> 5: 02.05
#> 6: 88.12
A-2. Uniform short format

icdPrShortToDecimal function converts the procedure codes to the short format, which can be used for grouping to the other classification functions (icdPrToCCS, icdPrToCCSLvl and icdPrToProcedureClass).

short <- icdPrDecimalToShort(prDataFile = samplePrFile,
                             icdColName = ICD, 
                             dateColName = Date,
                             icd10usingDate = "2015/10/01")
head(short$ICD)
#>     ICD
#> 1: 5681
#> 2: 9774
#> 3: 4499
#> 4: 0759
#> 5: 0205
#> 6: 8812

Warning message

Besides, code standardization functions generate data of procedure codes with potential error to help researchers identify the potential coding mistake that may affect the result of following clinical data analysis.

There are two error type:wrong format and wrong version. The former one means the ICD code does not exist (maybe because ICD is wrongly coded or with a wrong place of decimal point). And the latter one means the version is wrong (still use ICD 9 after icd10usingDate, etc.).

Users can check data after receiving the warning message. researcher identify the potential coding mistake that may affect clinical data analysis.

II. Data integration

Functions stated below collapse ICD codes into a smaller number of clinically meaningful categories that are more useful for presenting descriptive statistics than individual ICD procedure codes are.

dxpr package supports two strategies to group EHR procedure codes, including CCS and procedure classes.

A. Clinical Classifications Software (CCS)

The CCS classification for ICD-9 and ICD-10 codes is a procedure categorization scheme that can employ in many types of projects analyzing data on procedures.

1) single-level: icdPrToCCS

Both ICD-9-PCS and ICD-10-PCS code contains 231 single-level CCS categories which can be corresponded with each other.

## ICD to CCS category
CCS <- icdPrToCCS(prDataFile = samplePrFile,
                  idColName = ID,
                  icdColName = ICD,        
                  dateColName = Date,
                  icd10usingDate = "2015-10-01",
                  isDescription = FALSE)

head(CCS$groupedDT, 5)
#>    Short ID   ICD       Date CCS_CATEGORY
#> 1:  5681  B  5681 2008-01-14          112
#> 2:  9774  A  9774 2009-01-11          131
#> 3:  4499  B 44.99 2009-05-10           94
#> 4:  0759  C 07.59 2009-01-21           12
#> 5:  0205  B  0205 2008-07-06            9

2) multi-level: icdPrToCCSLvl

Multi-level CCS in ICD-9-PCS has 3 levels, and multi-level CCS in ICD-10-PCS has two levels.

## ICD to CCS multiple level 2 description
CCSLvl <- icdPrToCCSLvl(prDataFile = samplePrFile,
                       idColName = ID,
                       icdColName = ICD,        
                       dateColName = Date,
                       icd10usingDate = "2015-10-01",
                       CCSLevel = 2,
                       isDescription = TRUE)

head(CCSLvl$groupedDT, 5)
#>    Short ID   ICD       Date                                    CCS_LVL_2_LABEL
#> 1:  5681  B  5681 2008-01-14   Other OR therapeutic procedures of urinary tract
#> 2:  9774  A  9774 2009-01-11 Other non-OR therapeutic procedures; female organs
#> 3:  4499  B 44.99 2009-05-10           Other OR upper GI therapeutic procedures
#> 4:  0759  C 07.59 2009-01-21             Other therapeutic endocrine procedures
#> 5:  0205  B  0205 2008-07-06     Other OR therapeutic nervous system procedures

B. Procedure Class

Procedure Classes are part of the family of databases and software tools developed as part of the Healthcare Cost and Utilization Project (HCUP) by AHRQ.
The Procedure Classes assign all ICD procedure codes to one of four categories:

  • Minor Diagnostic - Non-operating room procedures that are diagnostic
  • Minor Therapeutic - Non-operating room procedures that are therapeutic
  • Major Diagnostic - Operating room procedures that are diagnostic
  • Major Therapeutic - Operating room procedures that are therapeutic
ProcedureClass <- icdPrToProcedureClass(prDataFile = samplePrFile,
                                        idColName = ID,
                                        icdColName = ICD,      
                                        dateColName = Date,
                                        icd10usingDate = "2015-10-01",
                                        isDescription = FALSE)

head(ProcedureClass$groupedDT, 5)
#>    Short ID   ICD       Date PROCEDURE_CLASS
#> 1:  5681  B  5681 2008-01-14               4
#> 2:  9774  A  9774 2009-01-11               2
#> 3:  4499  B 44.99 2009-05-10               4
#> 4:  0759  C 07.59 2009-01-21               4
#> 5:  0205  B  0205 2008-07-06               4