Perform extra validation on s3 output dimensions
Categories
(Data Platform and Tools :: General, defect, P1)
Tracking
(Not tracked)
People
(Reporter: whd, Assigned: trink)
Details
(Whiteboard: [DataOps])
Updated•8 years ago
|
Reporter | ||
Comment 1•8 years ago
|
||
Updated•8 years ago
|
Reporter | ||
Updated•8 years ago
|
Updated•7 years ago
|
Comment 2•7 years ago
|
||
Reporter | ||
Comment 3•7 years ago
|
||
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Updated•6 years ago
|
Assignee | ||
Comment 4•6 years ago
|
||
Adding a proper fix as the workaround still has some failure modes that we are hitting.
Assignee | ||
Comment 5•6 years ago
|
||
Re Comment #3
The parquet output allows the full set of safe characters as described here (including the single quote): https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
whd: can you clarify what the specific issue is with the special characters i.e. do we need to alter that set?
The s3 file output is more restrictive only allowing a-z, A-Z, 0-9, period and underscore.
Both sets are compatible with the Linux file system and S3. However, in the parquet case third party tools/scripts would have to be careful with the special characters. I was considering standardizing the normalization but we would have to relax the s3 file output otherwise we risk changing some of the parquet dimensions. That wouldn't buy us a lot so I think we should leave it alone.
Reporter | ||
Comment 6•6 years ago
|
||
(In reply to Mike Trinkala [:trink] from comment #5)
Re Comment #3
The parquet output allows the full set of safe characters as described here (including the single quote): https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
whd: can you clarify what the specific issue is with the special characters i.e. do we need to alter that set?
Nope, turns out we're fine and that the issue was only due to overly long file names before / after rename. We should be good with leaving the set as-is.
I was considering standardizing the normalization but we would have to relax the s3 file output otherwise we risk changing some of the parquet dimensions. That wouldn't buy us a lot so I think we should leave it alone.
Agreed.
Assignee | ||
Comment 7•6 years ago
|
||
Updated•2 years ago
|
Description
•