How to Query Metadata from the Fragile Families API

Ryan Vinh and Ian E. Fellows

2018-11-12

Introduction

The ffmetadata package provides easy to use access to metadata surrounding the Fragile Families Project data (https://fragilefamilies.princeton.edu/). The data itself is complex, but this tool makes it easy to find and filter information about the variables included in the data. It does this by querying the Fragile Families web API. Two functions are available for use:

select_metadata()

Using select_metadata() to find out information out about a variable

Selecting one field

Suppose we want to find out the value of a given variable’s field. For example, let’s say we want to find out the source of the variable with the name “ce3datey”. To accomplish this, we would call select_metadata() using “ce3datey” for variable_name and “data_source” for fields.

select_type:

## [1] "constructed"

Selecting multiple fields

select_metadata() can also be used to find out information about several fields of a given variable along the following lines:

select_multiple_fields:

##        data_source   data_type   
## result "constructed" "Continuous"

Selecting the entire variable

If we want to view the entire variable and all the values for its fields, we can call select_metadata without using the fields parameter and simply using “ce3datey” for variable_name. This will return ce3datey as a data frame row, with each of its fields corresponding to a column of that row.

select_full:

##        data_source   data_type    focal_person fp_PCG fp_father fp_fchild
## result "constructed" "Continuous" ""           "0"    "0"       "0"      
##        fp_mother fp_other fp_partner group_id id       in_FFC_file
## result "0"       "0"      "0"        "1520a"  "179209" "Yes"      
##        label                        leaf    n_cities_asked name      
## result "date of observation - year" "datey" "20"           "ce3datey"
##        old_name             respondent            subtopics 
## result "ffcc_centobs_datey" "Child Care Provider" "paradata"
##        survey                           topics                 warning    
## result "Child Care Center Observations" "Paradata and weights" "No Issues"
##        wave    
## result "Year 3"

Modifying the return type

For those who seek greater control over the formatting process, the returnDataFrame parameter can be set to FALSE. This will cause select_metadata() to return a nested list object that aligns more directly with the underlying JSON represenation of the data. By default, select_metadata() will return a dataframe unless this parameter’s value is specified.

select_return_list:

## $data_source
## [1] "constructed"
## 
## $data_type
## [1] "Continuous"
## 
## $focal_person
## [1] ""
## 
## $fp_PCG
## [1] 0
## 
## $fp_father
## [1] 0
## 
## $fp_fchild
## [1] 0
## 
## $fp_mother
## [1] 0
## 
## $fp_other
## [1] 0
## 
## $fp_partner
## [1] 0
## 
## $group_id
## [1] "1520a"
## 
## $id
## [1] 179209
## 
## $in_FFC_file
## [1] "Yes"
## 
## $label
## [1] "date of observation - year"
## 
## $leaf
## [1] "datey"
## 
## $n_cities_asked
## [1] 20
## 
## $name
## [1] "ce3datey"
## 
## $old_name
## [1] "ffcc_centobs_datey"
## 
## $probe
## NULL
## 
## $qtext
## NULL
## 
## $respondent
## [1] "Child Care Provider"
## 
## $scale
## NULL
## 
## $section
## NULL
## 
## $subtopics
## [1] "paradata"
## 
## $survey
## [1] "Child Care Center Observations"
## 
## $topics
## [1] "Paradata and weights"
## 
## $warning
## [1] "No Issues"
## 
## $wave
## [1] "Year 3"

search_metadata()

Using search_metadata() to search for variables

search_metadata() allows users to search for variables based on specified field values. This function returns a list of all the variable names that match the specified parameters. For instance, suppose we want to search for all the variables from the “Year 1” wave. To accomplish this, we would call search_metadata() in the following way:

Any of the above-specified fields can be used to search for variables in combination with one another. For example, suppose we want to search for all the variables from the “Year 1” wave that have “Mother” listed as the respondent. To accomplish this, we would call search_metadata() like so:

Using Operations with search_metadata()

search_metadata() also provides functionality for a number of other operators. For instance, if we want to find all the variables with names that start with the string “f1”, we can use the “like” operator like so:

As another example, if we want to find all the variables for which the respondent was either the father or the mother, we can use the “in” operator like so:

The operation convention changes slightly when using either of the operations related to null checking. Rather than specify the operation using the operation parameter, the null check is specifed by the field value. For instance, if we want to find all the variables in which the question text is null, we can format the call like so:

By default, the operation parameter is set to equals, but it can be specified for a variety of operations. Below is a list of valid operations:

  • like: search for a pattern
  • lt: less-than
  • le: less-than-or-equal-to
  • gt: greater-than
  • gte: greater-than-or-equal-to
  • neq: not equals
  • in: is in (requires list value)
  • not_in: is not in (requires list value)
  • is_null: is null / missing
  • is_not_null: is not null / not missing

Field names that can be used in these functions

The select_metadata() and search_metadata() functions both involve searching or using the field names of the metadata variables in some way. Below are the field names that can be used when invoking these functions: