This query function can select the cases matching defined conditions for analyses.

selectCases(
  dxDataFile,
  idColName,
  icdColName,
  dateColName,
  icdVerColName = NULL,
  icd10usingDate = NULL,
  groupDataType = CCS,
  customGroupingTable,
  isDescription = TRUE,
  caseCondition,
  caseCount = 2,
  periodRange = c(30, 365),
  caseName = "Selected"
)

Arguments

dxDataFile

A data frame object of clinical diagnostic data with at least 3 columns: ID, ICD, and Date. As for date column, the data format should be YYYY/MM/DD or YYYY-MM-DD.

idColName

Column name of ID column in dxDataFile. Data type of this argumant should be string without quotation marks.

icdColName

Column name of ICD column in dxDataFile. Data type of this argumant should be string without quotation marks.

dateColName

Column name of date column in dxDataFile, and the type of date column should be a date format in R or a string format with date information in YYYY/MM/DD or YYYY-MM-DD. Data type of this argumant should be string without quotation marks.

icdVerColName

(Optional) Column name if there is a columns to record ICD-9/10 version used in dxDataFile. In this column, data format should be numeric 9L or 10L to indicate which ICD version is used for each cell. See examples below to get more information.

icd10usingDate

The date that ICD-10 was started to be used in dxDataFile dataset. The data format should be YYYY/MM/DD or YYYY-MM-DD. Necessary if icdVerColName is null.

groupDataType

Five Stratified methods can be chosen: CCS (ccs), multiple-level CCS (ccslvl1, ccslvl2, CCSR (ccsr),ccslvl3, ccslvl4), PheWAS (PheWAS), comorbidities (ahrq,charlson, elix), precise or fuzzy customized method (customGrepIcdGroup, customIcdGroup). The value should be string stated above without quotation mark. Default value is ccs. When conducting cases selection by un-grouped ICD codes, then use the method: ICD (ICD).

customGroupingTable

Used-defined grouping categories. icdDxToCustom needs a dataset with two columns called "Group" and "ICD", respectively; User can define one or more disease categories in "Group" column, and define a list of corresponding category-related ICD codes in "ICD" column. icdDxToCustomGrep needs a dataset with two columns: "Group", "grepIcd"; "Group" defines one or more disease categories and "grepICD" defines disease-related ICD code character strings containing regular expressions.

isDescription

Binary. If true, category description of classification methods will be used in the group column. If false, category name will be used. By default, it is set to be True (standard category description).

caseCondition

Certain diseases to be selected. The condition can be specific ICD, CCS category description, etc. String with regular expression is also supported.

caseCount

Minimum number of diagnoses time to be selected. If caseCount = 2, then only patients who had been diagnosed twice (or above) would be selected. Default value is 1.

caseName

Value to identify selected or not. The value will be filled in the labeling column called selectedCase. By default, it is set to be "selected".

PeriodRange

Determine duration of interest for performing the case selection. By default, it is set from 30 to 365 days (with argument c(30,365)). The lower bound and the upper of the wanted duration should be coded as a vector.

Value

A new data.table based on standard classification dataset with a new column: selectedCase, in which each cell is labeled as selected or not. If the patient was diagnosed with certain diseases, but the selection condition is not satisfied, then the selectedCase cell will be labeled with a star (*).

Details

User can select cases by diagnostic categories, such as CCS category, ICD codes, etc. The function also provides the options to set the minimum number of diagnoses within a specific duration. The output dataset can be passed to `groupedDataLongToWide` to create tables in wide format for statistical analytic usage.

See also

Other data integration functions: splitDataByDate, getEligiblePeriod, getConditionEra

Examples

# sample file for example

head(sampleDxFile)
#>     ID  ICD       Date Version
#> 1:  A2 Z992 2020-05-22      10
#> 2:  A5 Z992 2020-01-24      10
#> 3:  A8 Z992 2015-10-27      10
#> 4: A13 Z992 2020-04-26      10
#> 5: A13 Z992 2025-02-02      10
#> 6: A15 Z992 2023-05-12      10

#select case with "Diseases of the urinary system" by level 2 of CCS classification

selectCases(dxDataFile = sampleDxFile,
            ID, ICD, Date,
            icdVerColName = NULL,
            groupDataType = ccslvl2,
            icd10usingDate = "2015/10/01",
            caseCondition = "Diseases of the urinary system",
            caseCount = 1)
#> Wrong ICD format: total 9 ICD codes (the number of occurrences is in brackets)
#> c("A0.11 (20)", "E114 (8)", "Z9.90 (6)", "F42 (6)", "001 (5)", "75.52 (4)", "755.2 (3)", "123.45 (3)", "7552 (2)")
#> 	
#> Wrong ICD version: total 7 ICD codes (the number of occurrences is in brackets)
#> c("V27.0 (18)", "A01.05 (8)", "42761 (7)", "V24.1 (6)", "A0105 (5)", "E03.0 (4)", "650 (4)")
#> 	
#> Warning: The ICD mentioned above matches to "NA" due to the format or other issues.
#> Warning: "Wrong ICD format" means the ICD has wrong format
#> Warning: "Wrong ICD version" means the ICD classify to wrong ICD version (cause the "icd10usingDate" or other issues)
#>      ID selectedCase count firstCaseDate endCaseDate    period MostCommonICD
#>  1:  A3     Selected     5    2008-07-08  2014-02-24 2057 days          V420
#>  2:  A1     Selected     5    2006-11-29  2014-09-24 2856 days          5855
#>  3: A10     Selected     5    2007-11-04  2012-07-30 1730 days         V5631
#>  4: A12     Selected     5    2006-05-14  2015-06-29 3333 days          5859
#>  5: A13     Selected     5    2006-04-29  2025-02-02 6854 days          5855
#>  6: A15     Selected     5    2007-05-25  2023-05-12 5831 days         V5631
#>  7: A18     Selected     5    2007-04-05  2014-03-04 2525 days          5855
#>  8:  A2     Selected     5    2011-09-20  2020-05-22 3167 days          5855
#>  9:  A6     Selected     5    2007-10-01  2015-07-12 2841 days         V4512
#> 10:  A9     Selected     5    2007-03-05  2013-11-09 2441 days          V420
#> 11:  B0     Selected     6    2015-12-26  2024-02-12 2970 days          N185
#> 12:  B1     Selected     6    2016-08-08  2024-03-04 2765 days          N183
#> 13:  B2     Selected     6    2016-03-20  2024-09-20 3106 days          N186
#> 14:  B3     Selected     6    2019-05-07  2025-05-25 2210 days          N189
#> 15:  B4     Selected     6    2015-12-02  2025-07-21 3519 days          N185
#> 16:  A0     Selected     5    2009-07-25  2013-12-20 1609 days          5856
#> 17: A11     Selected     5    2008-03-09  2011-09-03 1273 days          5855
#> 18: A14     Selected     5    2006-11-28  2014-12-21 2945 days          V560
#> 19: A16     Selected     5    2007-04-15  2014-12-05 2791 days         V5631
#> 20: A17     Selected     5    2007-02-19  2014-07-03 2691 days          5856
#> 21:  A4     Selected     5    2006-10-20  2015-03-09 3062 days         V5631
#> 22:  A5     Selected     5    2009-09-10  2020-01-24 3788 days          V420
#> 23:  A7     Selected     5    2007-02-01  2014-08-14 2751 days          5854
#> 24:  A8     Selected     5    2007-11-22  2015-10-27 2896 days          V561
#> 25:  C0 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 26:  C1 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 27:  C2 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 28:  C3 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 29:  C4 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 30:  D0 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 31:  D1 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 32:  D2 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 33:  D3 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 34:  D4 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 35:  D5 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 36:  D6 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 37:  D7 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#> 38:  D8 non-Selected    NA          <NA>        <NA>   NA days          <NA>
#>      ID selectedCase count firstCaseDate endCaseDate    period MostCommonICD
#>     MostCommonICDCount
#>  1:                  3
#>  2:                  2
#>  3:                  2
#>  4:                  2
#>  5:                  2
#>  6:                  2
#>  7:                  2
#>  8:                  2
#>  9:                  2
#> 10:                  2
#> 11:                  2
#> 12:                  2
#> 13:                  2
#> 14:                  2
#> 15:                  2
#> 16:                  1
#> 17:                  1
#> 18:                  1
#> 19:                  1
#> 20:                  1
#> 21:                  1
#> 22:                  1
#> 23:                  1
#> 24:                  1
#> 25:                 NA
#> 26:                 NA
#> 27:                 NA
#> 28:                 NA
#> 29:                 NA
#> 30:                 NA
#> 31:                 NA
#> 32:                 NA
#> 33:                 NA
#> 34:                 NA
#> 35:                 NA
#> 36:                 NA
#> 37:                 NA
#> 38:                 NA
#>     MostCommonICDCount