This query function can select the cases matching defined conditions for analyses.
selectCases(
dxDataFile,
idColName,
icdColName,
dateColName,
icdVerColName = NULL,
icd10usingDate = NULL,
groupDataType = CCS,
customGroupingTable,
isDescription = TRUE,
caseCondition,
caseCount = 2,
periodRange = c(30, 365),
caseName = "Selected"
)
A data frame object of clinical diagnostic data with at least 3 columns: ID, ICD, and Date. As for date column, the data format should be YYYY/MM/DD or YYYY-MM-DD.
Column name of ID column in dxDataFile. Data type of this argumant should be string without quotation marks.
Column name of ICD column in dxDataFile. Data type of this argumant should be string without quotation marks.
Column name of date column in dxDataFile, and the type of date column should be a date format in R or a string format with date information in YYYY/MM/DD or YYYY-MM-DD. Data type of this argumant should be string without quotation marks.
(Optional) Column name if there is a columns to record ICD-9/10 version used in dxDataFile. In this column, data format should be numeric 9L or 10L to indicate which ICD version is used for each cell. See examples below to get more information.
The date that ICD-10 was started to be used in dxDataFile dataset. The data format should be YYYY/MM/DD or YYYY-MM-DD. Necessary if icdVerColName is null.
Five Stratified methods can be chosen: CCS (ccs
), multiple-level CCS (ccslvl1
, ccslvl2
, CCSR (ccsr
),ccslvl3
, ccslvl4
), PheWAS (PheWAS
), comorbidities (ahrq
,charlson
, elix
), precise or fuzzy customized method (customGrepIcdGroup
, customIcdGroup
). The value should be string stated above without quotation mark. Default value is ccs
. When conducting cases selection by un-grouped ICD codes, then use the method: ICD (ICD
).
Used-defined grouping categories. icdDxToCustom
needs a dataset with two columns called "Group" and "ICD", respectively; User can define one or more disease categories in "Group" column, and define a list of corresponding category-related ICD codes in "ICD" column. icdDxToCustomGrep
needs a dataset with two columns: "Group", "grepIcd"; "Group" defines one or more disease categories and "grepICD" defines disease-related ICD code character strings containing regular expressions.
Binary. If true, category description of classification methods will be used in the group column. If false, category name will be used. By default, it is set to be True
(standard category description).
Certain diseases to be selected. The condition can be specific ICD, CCS category description, etc. String with regular expression is also supported.
Minimum number of diagnoses time to be selected. If caseCount
= 2
, then only patients who had been diagnosed twice (or above) would be selected. Default value is 1.
Value to identify selected or not. The value will be filled in the labeling column called selectedCase
. By default, it is set to be "selected"
.
Determine duration of interest for performing the case selection. By default, it is set from 30 to 365 days (with argument c(30,365)
). The lower bound and the upper of the wanted duration should be coded as a vector.
A new data.table
based on standard classification dataset with a new column: selectedCase
, in which each cell is labeled as selected or not. If the patient was diagnosed with certain diseases, but the selection condition is not satisfied, then the selectedCase
cell will be labeled with a star (*).
User can select cases by diagnostic categories, such as CCS category, ICD codes, etc. The function also provides the options to set the minimum number of diagnoses within a specific duration. The output dataset can be passed to `groupedDataLongToWide` to create tables in wide format for statistical analytic usage.
Other data integration functions: splitDataByDate
, getEligiblePeriod
, getConditionEra
# sample file for example
head(sampleDxFile)
#> ID ICD Date Version
#> 1: A2 Z992 2020-05-22 10
#> 2: A5 Z992 2020-01-24 10
#> 3: A8 Z992 2015-10-27 10
#> 4: A13 Z992 2020-04-26 10
#> 5: A13 Z992 2025-02-02 10
#> 6: A15 Z992 2023-05-12 10
#select case with "Diseases of the urinary system" by level 2 of CCS classification
selectCases(dxDataFile = sampleDxFile,
ID, ICD, Date,
icdVerColName = NULL,
groupDataType = ccslvl2,
icd10usingDate = "2015/10/01",
caseCondition = "Diseases of the urinary system",
caseCount = 1)
#> Wrong ICD format: total 9 ICD codes (the number of occurrences is in brackets)
#> c("A0.11 (20)", "E114 (8)", "Z9.90 (6)", "F42 (6)", "001 (5)", "75.52 (4)", "755.2 (3)", "123.45 (3)", "7552 (2)")
#>
#> Wrong ICD version: total 7 ICD codes (the number of occurrences is in brackets)
#> c("V27.0 (18)", "A01.05 (8)", "42761 (7)", "V24.1 (6)", "A0105 (5)", "E03.0 (4)", "650 (4)")
#>
#> Warning: The ICD mentioned above matches to "NA" due to the format or other issues.
#> Warning: "Wrong ICD format" means the ICD has wrong format
#> Warning: "Wrong ICD version" means the ICD classify to wrong ICD version (cause the "icd10usingDate" or other issues)
#> ID selectedCase count firstCaseDate endCaseDate period MostCommonICD
#> 1: A3 Selected 5 2008-07-08 2014-02-24 2057 days V420
#> 2: A1 Selected 5 2006-11-29 2014-09-24 2856 days 5855
#> 3: A10 Selected 5 2007-11-04 2012-07-30 1730 days V5631
#> 4: A12 Selected 5 2006-05-14 2015-06-29 3333 days 5859
#> 5: A13 Selected 5 2006-04-29 2025-02-02 6854 days 5855
#> 6: A15 Selected 5 2007-05-25 2023-05-12 5831 days V5631
#> 7: A18 Selected 5 2007-04-05 2014-03-04 2525 days 5855
#> 8: A2 Selected 5 2011-09-20 2020-05-22 3167 days 5855
#> 9: A6 Selected 5 2007-10-01 2015-07-12 2841 days V4512
#> 10: A9 Selected 5 2007-03-05 2013-11-09 2441 days V420
#> 11: B0 Selected 6 2015-12-26 2024-02-12 2970 days N185
#> 12: B1 Selected 6 2016-08-08 2024-03-04 2765 days N183
#> 13: B2 Selected 6 2016-03-20 2024-09-20 3106 days N186
#> 14: B3 Selected 6 2019-05-07 2025-05-25 2210 days N189
#> 15: B4 Selected 6 2015-12-02 2025-07-21 3519 days N185
#> 16: A0 Selected 5 2009-07-25 2013-12-20 1609 days 5856
#> 17: A11 Selected 5 2008-03-09 2011-09-03 1273 days 5855
#> 18: A14 Selected 5 2006-11-28 2014-12-21 2945 days V560
#> 19: A16 Selected 5 2007-04-15 2014-12-05 2791 days V5631
#> 20: A17 Selected 5 2007-02-19 2014-07-03 2691 days 5856
#> 21: A4 Selected 5 2006-10-20 2015-03-09 3062 days V5631
#> 22: A5 Selected 5 2009-09-10 2020-01-24 3788 days V420
#> 23: A7 Selected 5 2007-02-01 2014-08-14 2751 days 5854
#> 24: A8 Selected 5 2007-11-22 2015-10-27 2896 days V561
#> 25: C0 non-Selected NA <NA> <NA> NA days <NA>
#> 26: C1 non-Selected NA <NA> <NA> NA days <NA>
#> 27: C2 non-Selected NA <NA> <NA> NA days <NA>
#> 28: C3 non-Selected NA <NA> <NA> NA days <NA>
#> 29: C4 non-Selected NA <NA> <NA> NA days <NA>
#> 30: D0 non-Selected NA <NA> <NA> NA days <NA>
#> 31: D1 non-Selected NA <NA> <NA> NA days <NA>
#> 32: D2 non-Selected NA <NA> <NA> NA days <NA>
#> 33: D3 non-Selected NA <NA> <NA> NA days <NA>
#> 34: D4 non-Selected NA <NA> <NA> NA days <NA>
#> 35: D5 non-Selected NA <NA> <NA> NA days <NA>
#> 36: D6 non-Selected NA <NA> <NA> NA days <NA>
#> 37: D7 non-Selected NA <NA> <NA> NA days <NA>
#> 38: D8 non-Selected NA <NA> <NA> NA days <NA>
#> ID selectedCase count firstCaseDate endCaseDate period MostCommonICD
#> MostCommonICDCount
#> 1: 3
#> 2: 2
#> 3: 2
#> 4: 2
#> 5: 2
#> 6: 2
#> 7: 2
#> 8: 2
#> 9: 2
#> 10: 2
#> 11: 2
#> 12: 2
#> 13: 2
#> 14: 2
#> 15: 2
#> 16: 1
#> 17: 1
#> 18: 1
#> 19: 1
#> 20: 1
#> 21: 1
#> 22: 1
#> 23: 1
#> 24: 1
#> 25: NA
#> 26: NA
#> 27: NA
#> 28: NA
#> 29: NA
#> 30: NA
#> 31: NA
#> 32: NA
#> 33: NA
#> 34: NA
#> 35: NA
#> 36: NA
#> 37: NA
#> 38: NA
#> MostCommonICDCount