vignettes/Eng_Procedure.Rmd
Eng_Procedure.Rmd
The proposed open-source dxpr package is a software tool aimed at expediting an integrated analysis of electronic health records (EHRs). By implementing dxpr package, it is easier to integrate, analyze, and visualize clinical data.
In this part, the instruction of how dxpr package workes with procedure records is provided.
install.packages("remotes")
# Install development version from GitHub
remotes::install_github("DHLab-TSENG/dxpr")
library(dxpr)
dxpr (procedure part) is used to pre-process procedure codes of EHRs. To execute functions in dxpr, the EHR data input should be a data frame object in R, and contain at least three columns: patient ID, ICD procedure codes and date.
Column names or column order of these three columns does not need to necessarily follow a rule. Each column name will be pass as argument. Detailed information of required data type of every column and argument of functions can be found in the reference section.
Also, in the R ecosystem, DBI, odbc, and other packages provide access to databases within R. As long as the data is retrieved from databases to a data frame in R, the processes are the same as the following example.
An example of the data shows as below:
This data is a simulated medical data set of 3 patients with 170
records.
head(samplePrFile)
#> ID ICD Date
#> 1: B 5681 2008-01-14
#> 2: A 9774 2009-01-11
#> 3: B 44.99 2009-05-10
#> 4: C 07.59 2009-01-21
#> 5: B 0205 2008-07-06
#> 6: B 8812 2007-06-27
dxpr package uses ICD-PCS codes as procedure standard. There are two formats of ICD-9 procedure codes, decimal (with a decimal point separating the code) and short format. And ICD-10 is only with short format.
ICD-9-PCS
# ICD-9-PCS_Short
head(ICD9PrwithTwoFormat$Short)
#> [1] "0001" "0002" "0003" "0009" "0010" "0011"
# ICD-9-PCS_Decimal
head(ICD9PrwithTwoFormat$Decimal)
#> [1] "00.01" "00.02" "00.03" "00.09" "00.10" "00.11"
ICD-10-PCS: only short format
# ICD-10-PCS_Short
head(prICD10$ICD)
#> [1] "0016071" "0016072" "0016073" "0016074" "0016075" "0016076"
dxpr package helps users to standardize the ICD-9 procedure codes into a uniform format before further code grouping. The formats used for different grouping methods are shown as Table 1.
Table 1 Format of code classification methods
ICD format | |
---|---|
Clinical Classifications Software (CCS) | short format |
Procedure class | short format |
Since formats of ICD-9 codes used within a dataset could be different, users can standardize the codes through this function.
The function only standardizes ICD-9 codes. There are two ways to
distinguish the version of ICD code (ICD-9/ICD-10) used in data: one is
a specific extra column that records version used (data type in this
column should be numeric 9
or 10
), the other
is a specific date that is the starting date of using ICD-10 in the
dataset. For example, reimbursement claims with a date required to use
ICD-10 codes in the United States and Taiwan are October 1st, 2015 and
January 1st, 2016, respectively.
decimal <- icdPrShortToDecimal(prDataFile = samplePrFile,
icdColName = ICD,
dateColName = Date,
icd10usingDate = "2015/10/01")
head(decimal$ICD)
#> ICD
#> 1: 56.81
#> 2: 97.74
#> 3: 44.99
#> 4: 07.59
#> 5: 02.05
#> 6: 88.12
icdPrShortToDecimal
function converts the procedure
codes to the short format, which can be used for grouping to the other
classification functions (icdPrToCCS
,
icdPrToCCSLvl
and icdPrToProcedureClass
).
short <- icdPrDecimalToShort(prDataFile = samplePrFile,
icdColName = ICD,
dateColName = Date,
icd10usingDate = "2015/10/01")
head(short$ICD)
#> ICD
#> 1: 5681
#> 2: 9774
#> 3: 4499
#> 4: 0759
#> 5: 0205
#> 6: 8812
Warning message
Besides, code standardization functions generate data of procedure codes with potential error to help researchers identify the potential coding mistake that may affect the result of following clinical data analysis.
There are two error type:wrong format and
wrong version. The former one means the ICD code does
not exist (maybe because ICD is wrongly coded or with a wrong place of
decimal point). And the latter one means the version is wrong (still use
ICD 9 after icd10usingDate
, etc.).
Users can check data after receiving the warning message. researcher identify the potential coding mistake that may affect clinical data analysis.
Functions stated below collapse ICD codes into a smaller number of clinically meaningful categories that are more useful for presenting descriptive statistics than individual ICD procedure codes are.
dxpr package supports two strategies to group EHR procedure codes, including CCS and procedure classes.
The CCS classification for ICD-9 and ICD-10 codes is a procedure categorization scheme that can employ in many types of projects analyzing data on procedures.
1) single-level: icdPrToCCS
Both ICD-9-PCS and ICD-10-PCS code contains 231 single-level CCS categories which can be corresponded with each other.
## ICD to CCS category
CCS <- icdPrToCCS(prDataFile = samplePrFile,
idColName = ID,
icdColName = ICD,
dateColName = Date,
icd10usingDate = "2015-10-01",
isDescription = FALSE)
head(CCS$groupedDT, 5)
#> Short ID ICD Date CCS_CATEGORY
#> 1: 5681 B 5681 2008-01-14 112
#> 2: 9774 A 9774 2009-01-11 131
#> 3: 4499 B 44.99 2009-05-10 94
#> 4: 0759 C 07.59 2009-01-21 12
#> 5: 0205 B 0205 2008-07-06 9
2) multi-level: icdPrToCCSLvl
Multi-level CCS in ICD-9-PCS has 3 levels, and multi-level CCS in ICD-10-PCS has two levels.
## ICD to CCS multiple level 2 description
CCSLvl <- icdPrToCCSLvl(prDataFile = samplePrFile,
idColName = ID,
icdColName = ICD,
dateColName = Date,
icd10usingDate = "2015-10-01",
CCSLevel = 2,
isDescription = TRUE)
head(CCSLvl$groupedDT, 5)
#> Short ID ICD Date CCS_LVL_2_LABEL
#> 1: 5681 B 5681 2008-01-14 Other OR therapeutic procedures of urinary tract
#> 2: 9774 A 9774 2009-01-11 Other non-OR therapeutic procedures; female organs
#> 3: 4499 B 44.99 2009-05-10 Other OR upper GI therapeutic procedures
#> 4: 0759 C 07.59 2009-01-21 Other therapeutic endocrine procedures
#> 5: 0205 B 0205 2008-07-06 Other OR therapeutic nervous system procedures
Procedure Classes are part of the family of databases and software
tools developed as part of the Healthcare Cost and Utilization Project
(HCUP) by AHRQ.
The Procedure Classes assign all ICD procedure codes to one of four
categories:
ProcedureClass <- icdPrToProcedureClass(prDataFile = samplePrFile,
idColName = ID,
icdColName = ICD,
dateColName = Date,
icd10usingDate = "2015-10-01",
isDescription = FALSE)
head(ProcedureClass$groupedDT, 5)
#> Short ID ICD Date PROCEDURE_CLASS
#> 1: 5681 B 5681 2008-01-14 4
#> 2: 9774 A 9774 2009-01-11 2
#> 3: 4499 B 44.99 2009-05-10 4
#> 4: 0759 C 07.59 2009-01-21 4
#> 5: 0205 B 0205 2008-07-06 4
ICD-9-PCS code (2014): https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes.html
ICD-10-PCS code (2019):https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-PCS.html
** Clinical Classifications Software (CCS)**
ICD-9-PCS CCS (2015): https://www.hcup-us.ahrq.gov/toolssoftware/ccs/Single_Level_CCS_2015.zip
https://www.hcup-us.ahrq.gov/toolssoftware/ccs/Multi_Level_CCS_2015.zip
ICD-10-PCS CCS (2019): https://www.hcup-us.ahrq.gov/toolssoftware/procedureicd10/procedure_icd10.jsp
Procedure Class
ICD-9-Procedure Class (2015): https://www.hcup-us.ahrq.gov/toolssoftware/procedure/pc2015.csv
ICD-10-Procedure Class (2019): https://www.hcup-us.ahrq.gov/toolssoftware/procedureicd10/procedure_icd10.jsp