First published in the public number :Python Data Scienceauthor : Dongge takes off
call API
And document databases will return nested JSON
object , When we use Python
When trying to convert a key in a nested structure to a column , Data loaded into pandas
The results are as follows :
df = pd.DataFrame.from_records(results [“ issues”],columns = [“ key”,“ fields”])
explain : here results It's a big dictionary ,issues yes results One of the keys ,issues The value of is a nested JSON List of object dictionaries , You'll see that in the back JSON Nested structure .
The problem lies in API Returns nested JSON
structure , And the keys we care about are at different levels in objects .
Nested JSON
The structure is like this .
And what we want is the following .
Here's a API Take the returned data as an example ,API Usually contains metadata about fields . Suppose these are the fields we want .
- key:JSON secret key , At the first level .
- summary: Secondary “ Field ” object .
- status name: The third level position .
- statusCategory name: Located at 4 Nested levels .
Above , We select the fields to extract in issues On the list JSON
In the structure, they are respectively in 4 Different levels of nesting , One button one ring .
{
"expand": "schema,names",
"issues": [
{
"fields": {
"issuetype": {
"avatarId": 10300,
"description": "",
"id": "10005",
"name": "New Feature",
"subtask": False
},
"status": {
"description": "A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.",
"id": "5",
"name": "Resolved",
"statusCategory": {
"colorName": "green",
"id": 3,
"key": "done",
"name": "Done",
}
},
"summary": "Recovered data collection Defraglar $MFT problem"
},
"id": "11861",
"key": "CAE-160",
},
{
"fields": {
... more issues],
"maxResults": 5,
"startAt": 0,
"total": 160
}
A bad solution
One option is to just roll the code , Write a function to find a specific field , But the problem is that you have to call this function for each nested field , Then call .apply
To DataFrame
New column in .
To get a few fields we want , First we extract fields Object to column in key :
df = (
df["fields"]
.apply(pd.Series)
.merge(df, left_index=True, right_index = True)
)
It can be seen from the above table that , Only summary Is available ,issuetype、status And so on are still buried in nested objects .
Here's the extraction issuetype Medium name One way .
# extract issue type Of name Call a new column "issue_type"
df_issue_type = (
df["issuetype"]
.apply(pd.Series)
.rename(columns={"name": "issue_type_name"})["issue_type_name"]
)
df = df.assign(issue_type_name = df_issue_type)
Like above , If there are too many levels of nesting , You need to roll a hand to return to realize , Because each level of nesting calls a method like the one above that parses and adds to the new column .
For the weak foundation of programming friends , It's really troublesome to roll one , Especially for data analysts , When you're in a hurry to use data , Hope to get structured data quickly for analysis .
I'd like to share one with you pandas
Built in solutions for .
Built-in solution
pandas
There's a built-in feature called .json_normalize
.
pandas
It is mentioned in the document of : Will be semi-structured JSON
The data is normalized to a flat table .
All the code of the previous scheme , With this built-in feature, you just need 3 OK, it's all right . The steps are simple , Understand the following usage .
Identify the fields we want to think about , Use . Symbols connect nested objects .
The nested list that you want to process ( Here is results["issues"]
) Put in as a parameter .json_normalize
in .
Filter what we define FIELDS list .
FIELDS = ["key", "fields.summary", "fields.issuetype.name", "fields.status.name", "fields.status.statusCategory.name"]
df = pd.json_normalize(results["issues"])
df[FIELDS]
you 're right , It's that simple .
Other operating
Record path
Except pass it on like that results["issues"]
Out of the list , We also use record_path
Parameter in JSON
The path to the specified list in the .
# Use paths instead of directly results["issues"]
pd.json_normalize(results, record_path="issues")[FIELDS]
Custom delimiter
You can also use sep Parameter to define the separator of nested structure connection , For example, the following will be the default “.” Replace “-”.
### use "-" Replace the default "."
FIELDS = ["key", "fields-summary", "fields-issuetype-name", "fields-status-name", "fields-status-statusCategory-name"]
pd.json_normalize(results["issues"], sep = "-")[FIELDS]
Control recursion
If you don't want to recurs to every child object , have access to max_level
Parameters control depth . under these circumstances , because statusCategory.name
The field is located in JSON
Object number 4 level , So it won't be included in the result DataFrame
in .
# Only drill down to the second level of nesting
pd.json_normalize(results, record_path="issues", max_level = 2)
Here is .json_normalize
Of pandas
Official documentation , If you don't understand, you can learn by yourself , This time Dongge introduced here .
pandas Official documents : https://pandas.pydata.org/pan...
Originality is not easy. , I think it's a good idea .
Welcome to follow my personal public number :Python Data Science
Data science learning website :datadeepin